How secure is your encryption?

Papal Encryption

Papal Encryption (Photo credit: Samuraijohnny)

The answer is, “probably quite secure, but not as secure as you might think.”

This is not a story about the NSA but about the fundamental maths of encryption – and about the ability to “guess” what an encrypted message might mean through a sophisticated version of “frequency analysis” – in other words guessing what the unencoded symbols would be on the basis of how likely they are to occur.

Up to now the assumption has been that for long enough messages encryption would essentially erase the underlying pattern of a message – in other words the encoded message would have the maximum possible degree of randomness or “entropy“.

A parallel can be made with compression – in a perfect compression algorithm entropy would be maximised: there would be no underlying pattern of 1s and 0s in the binary – as if there was a pattern then this too could be compressed (replaced by a shorter symbol) and so on. In fact most compression algorithms are pretty good – which is why you cannot repeatedly zip files and hope they will keep getting smaller.

But if you are using encryption you might not regard “pretty good” and really good enough – you would want a more or less nailed-on guarantee that your encryption would have maximum entropy, as a pattern in the coded message might point to the patterns in the underlying message.

To get that guarantee we need to show that coding the same word in “clear text” would result in a truly random selection of “code words”. Now, no mathematical process can ever truly guarantee this but it was assumed, based on what seemed to be sound reasoning, that for a sufficiently long message of “clear text” the entropy of the coded message would rapidly approach the maximum.

But report Mark Christiansen and Ken Duffy from the National University of Ireland (Maynooth) and Flavio du Pin Calmon and Muriel Medard from MIT – that assumption is flawed. In fact, in the real world, the approach to maximum entropy is a good deal slower than previously believed: in fact “conditioned” (i.e. real world) sources never get there

There is a paper discussing this – here – with some heavy duty maths. Another article – which probably does a better job than me in explaining it, (but also seems to play down the Irish connection!) is here.

The bottom line seems to be that sophisticated “brute force” attacks (also known as guessing!) might just work after all – because once you guess one word well, the rest of the code might fall into place. If the decoder has some clues about what the message might contain (think of the cryptanalysts at Bletchley Park knowing that many German messages contained weather reports) then it is possible that guesses might – just might – work.

Does this mean that your encryption is broken? Probably not. But it does mean that someone might be able to break your messages well inside the known age of the universe after all.