I now have some code that is meant to parse an XML file of approximately 5 billion lines.
Unfortunately it fails, every time (it seems), on line 4,295,025,275.
This is something of a nightmare to debug – but it looks like an overflow bug (in the xerces-c parser) of some sort.
Do I try to find it by inspection or by repeating the runs (it takes about 4 – 5 hours to get to the bug point)?
One is probably quite difficult but simple to organise. The other is (relatively) easier – just step through the code – but is perhaps impossible to organise – how many weeks of wall clock time in a debugger before we get to that instance?