The joy of regex

Standard
Deterministic Finite Automaton recognizing the...

Image via Wikipedia

A regex is, as any fule kno, a regular expression: a pretty fundamental concept in computing (the picture here shows an evaluation mechanism for one, albeit for a limited language) – computers, being deterministic, rely on “regular languages” for programs and so regexes are generally speaking widely used in parsing the languages.

But what has motivated me to write this blog (apart from the need to demote a story with the word “sexy” in the headline) is that they are the sort of thing that ought to be taught more widely than the low-quality stuff that seems to pass for ICT education in schools.

Because they link a computing fundamental to something that would be useful in everyday life. Think of it this way – you have a document which logs 20,000 library loans  with a record tag and a 24 hr clock. You know about 500 of them are of interest to you because they record when, say, the system recorded “DVD borrowed” or “CD borrowed” (ie something like “DVD borrowed:    17:03:19”) – how can you extract just that?

A regex could do that in seconds – and while they can look complex, they are not that difficult to learn – at the risk of getting it all wrong (as I haven’t tested it) then the one for this could be (depending on the precise format):

(DVD|CD) borrowed:\t([0-1]?[0-9]|2[0-3])(:([0-5][0-9])){2}/)

Of course you have to have something with a regex engine available to crack that – though it’s actually quite a widely supported option in much free and proprietary software, it’s just nobody teaches you this hugely useful skill.