Going on a bug(?) hunt


I now have some code that is meant to parse an XML file of approximately 5 billion lines.

Flow of data in a typical parser
Flow of data in a typical parser (Photo credit: Wikipedia)

Unfortunately it fails, every time (it seems), on line 4,295,025,275.

This is something of a nightmare to debug – but it looks like an overflow bug (in the xerces-c parser) of some sort.

Do I try to find it by inspection or by repeating the runs (it takes about 4 – 5 hours to get to the bug point)?

One is probably quite difficult but simple to organise. The other is (relatively) easier – just step through the code – but is perhaps impossible to organise – how many weeks of wall clock time in a debugger before we get to that instance?