(If you don’t get a copy of The Art of Unix Programming – there is an awful lot of rubbish in that book but it does do one thing well: explain the deep connection between text and Unix.)
In a practical sense this means to get the best from your Unix system (and this includes you if you are a Mac OSX user) you need to boost your command line skills. The first thing to do is, of course, become familiar with a text editor – either vi or emacs (I am a vi user, but refuse to engage in a religious war on this matter.)
Then, perhaps not the next thing, but one of the next things you should do is learn sed – the streaming editor – one of the many gifts to the world (including Unix, of course) from Bell Labs (I recently read The Idea Factory: Bell Labs and the Great Age of American Innovation and I suppose I really ought to get around to writing a review of that).
Sed comes from the 1970s, but as so often in computing, it feels to me that its time has come again – in the era of big data a program that allows you to edit a file one line at a time – as opposed to trying to read as much of a file as possible into your computer’s memory – has come round again.
If you are sufficiently long in the tooth to have messed about with Microsoft’s edlin or any other line editor you might be forgiven for giving a hollow laugh at this point – but sed is a tool that genuinely repays the effort you have to make to learn it.
In the last few weeks I have been messing about with 220GB XML files and even the University of York’s big iron compute server cannot handle a buffered edit of a file that size – sed is the only realistic alternative (actually I thought about using my own hex editor – hexxed – which is also essentially a line editor, but a hex editor is really for messing about with binary files and I wouldn’t recommend it.
Sed has allowed me to fix errors deep inside very large files with just a few commands – eg:
LANG=C sed ‘51815253s@^.*$@<instruction address=\’004cf024\’ size=’03’ />@’ infile.xml >outfile.xml
Fixes line 51,815,253 in my file (the line identified by an XML fatal error). Earlier I had executed another line of sed to see what was wrong with that line:
LANG=C sed -n ‘51815253p’ infile.xml
(The LANG=C prefix is because the breakage involved an alien locale seemingly being injected into my file.)
Sed allows you to do much more – for instance anything you can identify through a pattern can be altered. Let’s say you have (text) documents with your old email address – firstname.lastname@example.org – and you want to change that to your new address – email@example.com …
sed ‘s/me@oldaddress\.com/me@newaddress\.com/g’ mytext.txt > newtext.txt
Then check newtext.txt for correctness before using
mv to replace the original.
But there is much, much more you can do with it.
Plus you get real cred as a Unix hacker if you know it.
Now, too many programs these days – especially anything from Redmond – go out of their way to suppress text formats. Text, after all, is resistant to the “embrace and extend” methodology – text wants to be free. But there is plenty of it out there still.
Books that teach you about sed are not so plentiful – I have been reading an old edition of sed & awk – which seems to be out of print – though you can buy a second hand copy for less than a quid excluding postage costs. Well worth the investment, I’d say.
- It came from… the HOLD SPACE (plover.com)
- The Recursive Nature of Automation (debianjoe.wordpress.com)
- How to remove ^M from UNIX files (twostepsaheadtoday.wordpress.com)