# Making a hash of universal credit

A hash algorithm, for computer scientists, is a way of turning one long string (some words, a number etc) into a shorter “hash code“.

Hashing is used in multiple ways – for instance to check that a file you have downloaded matches the one on the server (you hash the downloaded file data and check it matches the advertised hash on the server). This is much quicker than comparing the numbers byte by byte and, if a good hashing algorithm is chosen, the chances of the “collision” (in other words the false matching of two hashes) are low, especially for the common types of error, so you can be pretty sure a matched hash means the download is a good one.

It seems that the UK’s Department for Work and PensionsUniversal Credit” project relies extensively on hashing to check that their data on people’s incomes are correct. And the failure to match hashes is very high – suggesting some sort of fundamental failure in the system.

I have written about the enormous risk that the Universal Credit project represents – at a time of dwindling resource budgets the government is seeking to deliver an IT project that has a direct bearing on the lives of millions of the most financially vulnerable people to a very tight deadline and using a development method – “agile” – that has been sold as the answer to all the problems of government IT despite the fact most software development textbooks will tell you this is not what you should be using “agile” for.

It might still come off – certainly the department are refusing to admit that there is any possibility of failure – but I think the evidence that it is not ready is beginning to amass.

At least, though, the latest story, even if it is not encouraging in terms of the project’s overall chances of success, does show that the DWP are at least carrying out part of the “agile” function – testing the software. Previously there was next to no sign of it. But “agile” is also supposed to mean being open about progress, getting “stakeholder buy-in” (yes, I know it’s a horrible phrase, but it communicates what this is about) and getting lots of user feedback as the development process goes on. Where is any of that? The less we see of it, the more it looks like the department have got something to hide.

And, one final comment. Ruth Owen from HMRC appears to quote a much lower hash match failure rate – 5% instead of 25% – but 5% in the real world would mean literally millions of failures every year. A 5% failure rate would be as broken as a 25% failure rate.

# A further thought on MD5

The main use of MD5 – at least if my computer is any guide – is to check that a file you have downloaded from the internet or elsewhere is what it says it is.

In fact in this general use MD5 is not being used to encrypt anything – instead it produces a “message digest” – a 128 bit number that is a hash function of the supplied file. The problem with collisions in this case is that it means two different files could give the same hashed value (ie MD5 digest) and you could be left thinking you had the genuine file when you did not.

But that 128 bit hashed value plainly is not going to give you back the file – unlike CSI:Miami and everywhere where you see a “let’s enhance that” computer graphics gimmick, in the real world you cannot get more information out than you put in: so a 128 bit number will not magically transform into a 5 MB file even if you can reverse the hashing.

But that was not the issue with the Sun – they appeared to be using MD5 to hash a short password and in that case, at least in theory, being able to crack MD5 could give the original information back.

# So, is the MD5 weakness a real world problem or not?

My last posting – made in a hurry while I was waiting for a large SCP transfer to complete – has generated more traffic than anything else in the last month: possibly because it was mildly topical and largely because it was retweeted by John Rentoul, one of the UK’s leading political commentators and all-round good egg.

Maybe I was being a bit naive with it – because I took what the New Scientist said the US Department of Homeland Security said about the MD5 hashing algorithm – in short, it is completely broken and should not be used – and LulzSec’s claim to have cracked the Sun’s MD5 based password system and drew what I thought was the obvious conclusion – that an MD5 crack was in some way related to LulzSec’s attack on the Sun’s website on last Monday night.

But at least one person who ought to know more about this than me – forensic investigator Jonathan Krause – has taken issue with it and indeed with the whole idea that MD5 is a major security risk:

I have to admit I find this all a bit puzzling, as the web is full of stories like “brute force algorithm can crack 1.5 million MD5 hashes per second” and so on, as well as even some sites that allow you to look up previously brute forced hashes. (Of course 1.5 million per second is not a lot in a key space of $2^{128}$.)

Yet on the other hand I can also find no concrete example (the disputed LulzSec crack at the Sun excepted) where someone is claiming to have made a practical use of an MD5 crack.