This may be useful to someone, it was for me.
To delete (say) between lines 10 and 1000000000 (inclusive) use sed:
sed 10,1000000000d <infile >outfile
I have not had much luck in hunting down what is wrong with my code or the Xerces-c SAX2 parser – but I do think I have successfully updated by hex editor, Hexxed, to handle 64 bit (ie >4GB) files.
Indeed it performs rather better than vi for some editing tasks (Hexxed has a vi like interface).
So, if a hex editor, capable of handling little and big endian code and able to display output in Unicode is what you are after, and if you are vi-conditioned, then maybe Hexxed is your thing.
Groovy code can be found at: https://github.com/mcmenaminadrian/hexxed
While a runnable jar for those of you who have Java but are not yet Groovy can be downloaded at: http://18.104.22.168/hexxed.jar
And there is more about it here: http://cartesianproduct.wordpress.com/2012/06/03/hexxed-usage-options/
Just remember it is code for playing with – don’t bet the farm on it. But, that said, I have no reason to think it does not work.
I now have some code that is meant to parse an XML file of approximately 5 billion lines.
Flow of data in a typical parser (Photo credit: Wikipedia)
Unfortunately it fails, every time (it seems), on line 4,295,025,275.
This is something of a nightmare to debug – but it looks like an overflow bug (in the xerces-c parser) of some sort.
Do I try to find it by inspection or by repeating the runs (it takes about 4 – 5 hours to get to the bug point)?
One is probably quite difficult but simple to organise. The other is (relatively) easier – just step through the code – but is perhaps impossible to organise – how many weeks of wall clock time in a debugger before we get to that instance?
Nature is perhaps the world’s leading scientific publication. Of course, these days it is a bit odd to have a scholarly journal that looks to cover all of science, but having a paper published in Nature would be a feather in anyone’s cap.
And, more than that, it was founded in, and much of it – including its subscription department – is run from the UK.
So why on Earth do they think it is acceptable to offer a subscription at the rate of $50 in the US and at £50 in the UK? At the current exchange rate the US subscription rate in equivalent to less than £32. To add to the insult, they offer subscriptions to those elsewhere in Europe for 50 Euros – around £43 at current exchange rates.
<!DOCTYPE lackeyml [
<!ELEMENT lackeyml (application?, (thread)*)>;
<!ATTLIST lackeyml version CDATA #FIXED "0.2">
<!ATTLIST lackeyml xmlns CDATA #FIXED "http://cartesianproduct.wordpress.com\">
<!ELEMENT application EMPTY>
<!ATTLIST application command CDATA #REQUIRED>
<!ELEMENT thread (instruction|store|load|modify)* >
<!ATTLIST thread tid CDATA #REQUIRED>
<!ELEMENT instruction EMPTY>
<!ATTLIST instruction address CDATA #REQUIRED>
<!ATTLIST instruction size CDATA #REQUIRED>
<!ELEMENT modify EMPTY>
<!ATTLIST modify address CDATA #REQUIRED>
<!ATTLIST modify size CDATA #REQUIRED>
<!ELEMENT store EMPTY>
<!ATTLIST store address CDATA #REQUIRED>
<!ATTLIST store size CDATA #REQUIRED>
<!ELEMENT load EMPTY>
<!ATTLIST load address CDATA #REQUIRED>
<!ATTLIST load size CDATA #REQUIRED>
Incidentally, if you don’t know what any of this means, you do not need to worry. But if you are really interested then it is a DTD (Document Type Definition) for the XMLised output of Valgrind’s Lackey tool, as patched by me to also output thread (Posix pthread) data. The original – version 0.1 – DTD (which does not account for thread data) can be found in the lackey_xml git repository.