Give yourself a Christmas present: learn sed


English: A Shebang, also Hashbang or Sharp ban...
A Shebang, also Hashbang or Sharp bang. (Photo credit: Wikipedia)

Text is at the core of The Unix Way – and all True Unix Hackers work from the command line. This much you know.

(If you don’t get a copy of The Art of Unix Programming – there is an awful lot of rubbish in that book but it does do one thing well: explain the deep connection between text and Unix.)

In a practical sense this means to get the best from your Unix system (and this includes you if you are a Mac OSX user) you need to boost your command line skills. The first thing to do is, of course, become familiar with a text editor – either vi or emacs (I am a vi user, but refuse to engage in a religious war on this matter.)

Then, perhaps not the next thing, but one of the next things you should do is learn sed – the streaming editor – one of the many gifts to the world (including Unix, of course) from Bell Labs (I recently read The Idea Factory: Bell Labs and the Great Age of American Innovation and I suppose I really ought to get around to writing a review of that).

Sed comes from the 1970s, but as so often in computing, it feels to me that its time has come again – in the era of big data a program that allows you to edit a file one line at a time – as opposed to trying to read as much of a file as possible into your computer’s memory – has come round again.

If you are sufficiently long in the tooth to have messed about with Microsoft’s edlin or any other line editor you might be forgiven for giving a hollow laugh at this point – but sed is a tool that genuinely repays the effort you have to make to learn it.

In the last few weeks I have been messing about with 220GB XML files and even the University of York’s big iron compute server cannot handle a buffered edit of a file that size – sed is the only realistic alternative (actually I thought about using my own hex editor – hexxed – which is also essentially a line editor, but a hex editor is really for messing about with binary files and I wouldn’t recommend it.

Sed has allowed me to fix errors deep inside very large files with just a few commands – eg:

LANG=C sed ‘51815253s@^.*$@<instruction address=\’004cf024\’ size=’03’ />@’ infile.xml >outfile.xml

Fixes line 51,815,253 in my file (the line identified by an XML fatal error). Earlier I had executed another line of sed to see what was wrong with that line:

LANG=C sed -n ‘51815253p’ infile.xml

(The LANG=C prefix is because the breakage involved an alien locale seemingly being injected into my file.)

Sed allows you to do much more – for instance anything you can identify through a pattern can be altered. Let’s say you have (text) documents with your old email address – me@oldaddress.com – and you want to change that to your new address – me@newaddress.com …

sed ‘s/me@oldaddress\.com/me@newaddress\.com/g’ mytext.txt > newtext.txt

Then check newtext.txt for correctness before using mv to replace the original.

But there is much, much more you can do with it.

Plus you get real cred as a Unix hacker if you know it.

Now, too many programs these days – especially anything from Redmond – go out of their way to suppress text formats. Text, after all, is resistant to the “embrace and extend” methodology – text wants to be free. But there is plenty of it out there still.

Books that teach you about sed are not so plentiful – I have been reading an old edition of sed & awk – which seems to be out of print – though you can buy a second hand copy for less than a quid excluding postage costs. Well worth the investment, I’d say.

Advertisements

Software testers wanted


Big-little endian
Big-little endian (Photo credit: Wikipedia)

I have now reached the point with my hex editor – Hexxed – that I can aggressively look for software testers with confidence, as I feel I have a piece of software that does all the key things I want:

  • insert (as zeros) and delete multiples of 8, 16, 32 or 64 bits at a time
  • load and save arbitrary files
  • do and undo edits

For those looking for hex editors (as opposed to just another binary editor) it will handle 8, 16, 32 and 64 bit hex representations in both little endian and big endian format (and you can switch between them), as well as display addresses in a block:offset format (block size may be set arbitrarily). While for those looking for a binary editor it has a Vi-like interface and displays charcters in both UTF8 and 16 bit unicode.

It doesn’t do everything, though, and it may still have bugs so testers are needed to identify what features it needs but has not got and what might go wrong with it.

I am hopeful that I may have a chance that could turn out to be reasonably widely used in future.

To test: you don’t need my permission, it’s free software, freely available – covered by the GNU general public licence (version 3). Though of course that means that no warranties, to the maximum level allowable in law, are offered either.

You can pull it from git here: https://github.com/mcmenaminadrian/hexxed

Or ask me for a jar file.Get the jar file – executable via standard Java setup – via here.

Please help and test if you can

Adding vi-like functionality to Hexxed


A diagram showing the key Unix and Unix-like o...
A diagram showing the key Unix and Unix-like operating systems (Photo credit: Wikipedia)

I have decided that I will model the keyboard interface for Hexxed on vi.

I know that is not what many/any coming from outside the Unix world will expect, but then there are plenty of Hex editors out there and I want to make one that will appeal to at least one niche.

As I instinctively type “:w” in all sorts of places these days, I think there will be some other people out there who might like that sort of functionality too.

Going to have to try eclim


MacVim icon, glossy style
Image via Wikipedia

For my MSc project I made heavy use of the Eclipse IDE to write various Groovy programs that took an XML input and output an SVG (of course SVG is XML also, but I hope you understand).

Groovy was a great choice as, while not as fast as C, for instance, it was easy to write something that could hack XML and SVG – all I had to worry about was the algorithm as much of the infrastructure for handling the file formats was to hand.

And Eclipse made perfect sense as the IDE as it had good Groovy support.

But my problem was I am a VIM user most of the time and so there was more than one time when I had to go back and clean up the :w mess I had left behind.

Now, it seems, there may be a solution to hand – eclim – which allows me to use VIM in Eclipse and vice versa. I will try it in the next few days and see how I get on.