Failing software – again

The line above is from a real (and current at time-of-posting) job advertisement for a software developer. I’m not positing it because I think it is bad, shocking or dangerous, but mainly because it is illustrative of the real world: developers are expected to be “pragmatic” when it comes to testing the software they make for correctness.

So when, as happened today with British Airways, we see a major software failure (or at least what looks like a major software failure), maybe we should not be too surprised.

There is more to it, of course, than the fact developers are expected to compromise on testing their code: there is the simple fact that most functions are not computable at all and that for many others we don’t know how to compute them.

For an example of something uncomputable there is Hilbert’s tenth problem:

For any given a Diophantine equation is there a general algorithm that tells us whether or not there is a solution with the unknowns taking integer values…

• A Diophantine equation (named after Diophantus of Alexandria) is a polynomial (e.g., $x = y$, $y^2 + 5x + z^{67} = 0$, $3x + 2y +5x^2y^3 + 90 = 7z$ and so on) equation where all the coefficients of the unknowns are integer (counting number) values and where there are a finite number of unknowns.
• An algorithm is a set of rules of simple mathematical procedures to solve a problem

So the task is to take a Diophantine equation and not to find a solution but to determine whether such a solution exists – but no such algorithm exists and therefore we cannot write any sort of computer program that would do this for us, no matter how powerful a computer we were using.

And for an example of something for which a solution does exist but which we will struggle to find, there is the famoustravelling salesman problem” – namely given a set of cities all connected by roads (or railways, or air routes) what is the most efficient way of visiting all the cities.

This problem is an example of what is known as an “NP” problem – in other words one for which (we believe) no general algorithm exists to produce a solution in “polynomial time” (meaning in a time related to the length of the input – in this case the number of cities).

Plainly a solution does exist – there is a quickest way to travel to all the cities – but the only immediately available way to be certain of finding this is to try all the solutions.

But that will take time: for even moderately sized problems quite possibly more time than we have before the Sun swallows the Earth. To get round this we have to use heuristics (essentially educated guesswork) or approximations.

In the travelling salesman case those approximations are powerful and efficient – that’s why modern logistics works – but the general point remains the same: in many cases we are building software that approximates the solution to our problem but may not answer it fully correctly and, unless there is a breakthrough which eliminates NP problems as a class, we are always going to be stuck with that.

Why Candy Crush is so difficult – and popular?

If the evidence of my commute is any guide then Candy Crush Saga probably uses as much compute cycles as any other program in London – it is wildly popular.

And if you are player (I am not) then you shouldn’t be surprised or disappointed if you actually struggle with it – because, in fact, there are no obvious “solutions” to the “problems” you face as a player.

Toby Walsh of the University of South Wales has published a paper than shows that the “problems” of Candy Crush are in the class of what is known as “NP”.

NP problems are those which we do not know in advance what the solution is, but for which we can relatively quickly check that the solution applied is correct. In fact NP problems are at the core of internet (and most serious) encryption – internet encryption is difficult to crack in an attack but decryption is easy if the keys  are known.

NP problems are in distinction from “P” problems for which we know, in advance, both a procedure to solve the problem and to check our answer is correct. The “P” here stands for “polynomial time” – meaning we can tell how long the problem will take to solve at the start because it is some multiple (a polynomial) of its initial complexity – long division by pen and paper is a good example from everyday life – we know how to do it and we can instinctively tell how long it will take just by looking at the length of the numbers we are dividing.

Some – but not many – mathematicians believe that NP problems are really just P problems in disguise – it is just that we have not just discovered the way to solve them. In fact it can be shown that if we could find a means of solving a special class of NP problems known as “NP complete” problems then we could solve all NP problems. But most mathematicians believe that NP problems are inherently not amenable to being solved by simple procedures in polynomial time, while yet others think that such a procedure might exist but that the polynomial will be so large that we might see little advantage in cracking the problem.

In the meantime a whole industry has arisen to offer software to solve – though heuristics (informed guesses) and approximations some of the most important NP problems – such as scheduling school or train timetables.

It’s probably not an unreasonable assumption that people like Candy Crush precisely because it is so difficult and because, it would appear, no one is ever going to come up with a way of quickly solving the problems. We can get better by playing more – so that our heuristic sense of what works increases, but no one is going to emerge as an invincible player – because that is inherently not possible. Probably.

(If you want to know more and are not afraid of some maths, then P, NP, and NP-Completeness: The Basics of Computational Complexity might be worth having a look at.)