## Failing software – again

The line above is from a real (and current at time-of-posting) job advertisement for a software developer. I’m not positing it because I think it is bad, shocking or dangerous, but mainly because it is illustrative of the real world: developers are expected to be “pragmatic” when it comes to testing the software they make for correctness.

So when, as happened today with British Airways, we see a major software failure (or at least what looks like a major software failure), maybe we should not be too surprised.

There is more to it, of course, than the fact developers are expected to compromise on testing their code: there is the simple fact that most functions are not computable at all and that for many others we don’t know how to compute them.

For an example of something uncomputable there is Hilbert’s tenth problem:

For any given a Diophantine equation is there a general algorithm that tells us whether or not there is a solution with the unknowns taking integer values…

• A Diophantine equation (named after Diophantus of Alexandria) is a polynomial (e.g., $x = y$, $y^2 + 5x + z^{67} = 0$, $3x + 2y +5x^2y^3 + 90 = 7z$ and so on) equation where all the coefficients of the unknowns are integer (counting number) values and where there are a finite number of unknowns.
• An algorithm is a set of rules of simple mathematical procedures to solve a problem

So the task is to take a Diophantine equation and not to find a solution but to determine whether such a solution exists – but no such algorithm exists and therefore we cannot write any sort of computer program that would do this for us, no matter how powerful a computer we were using.

And for an example of something for which a solution does exist but which we will struggle to find, there is the famoustravelling salesman problem” – namely given a set of cities all connected by roads (or railways, or air routes) what is the most efficient way of visiting all the cities.

This problem is an example of what is known as an “NP” problem – in other words one for which (we believe) no general algorithm exists to produce a solution in “polynomial time” (meaning in a time related to the length of the input – in this case the number of cities).

Plainly a solution does exist – there is a quickest way to travel to all the cities – but the only immediately available way to be certain of finding this is to try all the solutions.

But that will take time: for even moderately sized problems quite possibly more time than we have before the Sun swallows the Earth. To get round this we have to use heuristics (essentially educated guesswork) or approximations.

In the travelling salesman case those approximations are powerful and efficient – that’s why modern logistics works – but the general point remains the same: in many cases we are building software that approximates the solution to our problem but may not answer it fully correctly and, unless there is a breakthrough which eliminates NP problems as a class, we are always going to be stuck with that.

## A (partial) answer to my Goedelian conundrum?

Last week I puzzled over what seemed to me to be the hand waiving dismissal, by both Alan Turing and Douglas Hofstadter of what I saw as the problem of humans being able to write true statements that the formal systems employed by computers could not determine – the problem thrown up by Goedel’s Incompleteness Theorems.

Well, Douglas Hofstadter has now come to his own (partial) rescue as I continue to read on through Godel, Escher, Bach – as he describes Tarski’s Theorem, which essentially states that humans cannot determine all such statements either (unless we posit the Church-Turing thesis is wrong, of course and there is some inner human computational magic we have yet to describe).

I am now going to quickly run through Hofstadter’s exposition – it might not mean too much to those of you not familiar with GEB, but if so and if you are interested in computation (and genetics and music) and you want to improve your mind this summer you could always think about buying the book. I don’t promise it’s an easy read, the style can vary from the nerdy to the deeply frustrating, but it is still a rewarding read.

So here goes:

We imagine we have a procedure that can determine the truth of a number theoretical statement, i.e.:

$TRUE\{a\}$ can tell us whether the number theoretical statement with Goedel number a is true.

So now we posit, $T$ with Goedel number $t$:

$\exists a:<\sim TRUE\{a\} \wedge ARITHMOQUINE\{a^{\prime\prime},a\}>$

Now, of you have read GEB you will know that to “arithmoquine” a number theoretical statement is to replace the free variable – in this case $a^{\prime\prime}$, with the Goedel number for the statement itself…

$\exists a:<\sim TRUE\{a\} \wedge ARITHMOQUINE\{SSS \ldots SSS0,a\}>$

Which we can state as “the arithmoquinification of $t$ is the Goedel number of a false statement”.

But the arithmoquinification of $t$ is $T$‘s own number, so this statement is the equivalent of saying “this statement is false”: just another version of the famous Epimenides Paradox, but one that is decidedly not hand waving in form: it’s about natural numbers.

The outcome is that $TRUE\{a\}$ cannot exist without our whole idea of natural numbers collapsing and we are forced to conclude there is no formal way of deciding what is a true statement of theoretical number theory using only theoretical number theory – and so humans are no better off than computers in this regard: we use concepts from without the formal theory to establish truth and we could surely program our computers to do the same. Turing’s “imitation game” conception of intelligent machines thus survives.

At least I think so!

## More than a game: the Game of Life

Conway’s Game of Life has long fascinated me. Thirty years ago I wrote some Z80 machine code to run it on a Sinclair ZX80 and when I wrote BINSIC, my reimplentation of Sinclair ZX81 BASIC, Life was the obvious choice for a demonstration piece of BASIC (and I had to rewrite it from scratch when I discovered that the version in Basic Computer Games was banjaxed).

But Life is much more than a game – it continues to be the foundation of ongoing research into computability and geometry – as the linked article in the New Scientist reports.

For me, it’s just fun though. When I wrote my first version of it back in 1981 I merely used the rubric in Basic Computer Games – there was no description of gliders or any of the other fascinating patterns that the game throws up – so in a sense I “discovered” them independently, with all the excitement that implies: it is certainly possible to spend hours typing in patterns to see what results they produce and to keep coming back for more.

• “Life.bas” should run on any system that will support the Java SDK – for instance it will run on a Raspberry Pi – follow the instructions on the BINSIC page. A more up to date version may be available in the Github repository at any given time (for instance, at the time of writing, the version in Git supports graphics plotting, the version in the JAR file on the server only supports text plotting). On the other hand, at any given time the version in Git may not work at all: thems the breaks. If you need assistance then just comment here or email me adrianmcmenamin at gmail.

## My (computer science) book of the year

It wasn’t published in 2011 but The Annotated Turing: A Guided Tour Through Alan Turing’s Historic Paper on Computability and the Turing Machine is without doubt my computer science book of the year.

If books can change your life then The Annotated Turing changed mine – because it showed me just how strong the link between maths and computer science is and how fundamental maths is to an understanding of the nature of reality and truth. The world, in my eyes, has not been the same since I read this book last January.

If you are a computer science student the you must read this book!

And finally, Happy New Year to all.

## Very small Turing machines

I am pretty busy with work now, so one of the things I had planned to do – write a simple Turing Machine in Groovy – will have to wait.

In the meantime here are some very small Turing machines to wonder over.

## The Great Alan Turing

Slashdot have a story to the effect that Leonardo DiCaprio is to play Alan Turing in a film that will mark the mathematician’s centenary next year.

Great news – the man’s memory deserves nothing more than the actor who has proved himself to be both great and edgy in recent work (he’s certainly not the milque toast figure the start of his career briefly suggested.)

As a geek, of course, I hope that the film will try to explain, just a little his achievements.

But how can you explain the ideas of computability and the Church-Turing thesis in a popular film? A tough one, but I suppose you could do something.

The Bletchley Park “bombe” and the idea that the weakness of the German Enigma machine – that it would never map a letter to itself (eg., in any message “e” would never be encrypted as “e”) – could be used to break the code (if a combination of a guessed plain text, usually a weather report, at the start of the message , and the initial key settings produced code that mapped letters to themselves then the initial settings were wrong) – is probably easier to explain.

And don’t forget about SIGSALY, the voice encryption system Turing worked on with Bell Labs. As a piece of engineering this is probably impossible to over-estimate in importance: as the first practical pulse code modulation system it could even be said to be the mother the mobile phone, or at least its grand aunt.

And, of course, let me again plug my book of the year: The Annotated Turing: A Guided Tour Through Alan Turing’s Historic Paper on Computability and the Turing Machine

## Reflections on the riots: part one

This is a blog about computing (along with some maths and science) – and not about politics, and having disciplined myself to stick to that for the last nine months, I intend to keep it that way, even as I write about the biggest political event of the year.

But I will allow myself two short political observations: firstly, that disrespect for the law and contempt for order are not new things in London. If you read Albion’s Fatal Tree you will see that there have long been many in the capital who have made their at least part of their livelihoods from criminality and who celebrated their fellows. Pretending that recent events represent some terrible breakdown in ancient respect for authority is ahistorical.

And, before people start to say it is the fault of rap music or other “alien” influences, do they remember this? Perhaps the Fast Show is the real cause of the disorder?

So, that over, what is the science point? Well, it was consistently reported during last week’s disturbances that the looters were sharing their intelligence through BlackBerry smart phones, specifically through “BlackBerry Messenger” (BBM). Given that the UK has one of the most sophisticated signals intelligence set-ups in the world at GCHQ, the fact that the police were clearly left in the halfpenny seats by the looters suggests to me that nobody there has yet proved that P=NP or developed an algorithm to crack the one way functions used to  encrypt the BBMs.

According to Wikipedia Blackberry encrypt everything with “Advanced Encryption Standard” (AES). A brute force attack on this would, on average, require $2^{255}$ attempts (for the 256 bit encryption), so that is not a practical option (eg the universe is very roughly $4^{17}$ seconds old).

Now, it could be that the US government has cracked this thing and just refuses to tell even its closest ally (I dare say the name Kim Philby is still spat out in various places), but my guess is that AES is safe, for now.

As I have said before that is probably a pity: while a world where P=NP would be one where internet commerce was broken, it would also be one with many compensatory benefits.

## “Basically, you would be able to compute anything you wanted”

The quote that forms the title here comes from Lance Fortnow, a computer scientist at Northwestern University, in an article (here – subscription required) in the current edition of the New Scientist on the $P = NP$ question.

It’s an odd statement for a computer scientist to make – most numbers are transcendental numbers and so are fundamentally incomputable: for instance there are $\aleph_{0}$ (aleph null – the smallest of Cantor’s hypothesised infinities) transcendental numbers between 0 and 1 (or between any range of integers).

But besides that oddity it is actually a very good article – calling the world where $P = NP$ Algorithmica – “the computing nirvana”.

I have written before of how much I hope we do live in Algorithmica, though the consensus is we live in a world of NDAlgorithmica (ND for non-deterministic).

The article’s beauty is that it also examines the states between the two: what if, for instance, we discovered that the class of $P$ problems were identical to the class of $NP$ problems but that we could not find the $P$ algorithm, or that the $P$ algorithm was of such a degree of complexity it “would not amount to a hill of beans”?

## Cardinality of the set of all strings

I finished John Naughton‘s A Brief History of the Future: Origins of the Internet – an interesting diversion, to be sure and a book worth reading (not only because it reminds you of how rapidly internet adoption has accelerated in the last decade.)

By now I should be on to proper revision, but I indulged myself last week and bought a copy of P, NP, and NP-Completeness: The Basics of Computational Complexity which I am now attempting to read between dozes in the garden (weather in London is fantastic). The book is written in a somewhat sparse style but even so is somewhat more approachable that many texts (the author, Professor Oded Goldreich, rules out using “non-deterministic Turing machines“, for instance, saying they cloud the explanation).

But in his discussion of (deterministic) Turing machines he states:

$the~set~of~all~strings~is~in~1-1~correspondence~to~the~natural~numbers$

Surely the set of all strings has the cardinality of the reals?

If we had a set of strings $(0,1)^*$ like this:

$1000000....$
$1100000....$
$1110000....$
$1111000....$

and so on…

Then surely the standard diagonalisation argument applies? (i.e. take the diagonal $1111111....$ and switch states of each member – $000000...$ and this string cannot be in the original set as it is guaranteed that for the $i^{th}$ member of the set, the with elements $\alpha_{1 ... \infty}$, $\alpha_{i}$ will be different. (See blog on diagonalisation.)

In Naughton’s book he makes (the very valid) point that students of the sciences are generally taught that when their results disagree with the paradigm, then their results are wrong and not the paradigm: so what have I got wrong here?

## The diagonal proof

I am just reading Computability and Logic and it (or at least the old, 1980, edition I am reading) has a rather laboured explanation of Cantor’s 1891 proof of the non-enumerability of real numbers (the diagonal proof) so to reassure myself and because it is interesting I thought I’d set out a version here (closely modelled on that found in The Annotated Turing).

If you are already lost then – here’s what it’s about: it is a proof that the infinite number of real numbers (that is to say numbers with something after the decimal point eg 3.1) is a bigger infinity that the infinite number of integers – or if you want think in terms of fractions – the rational numbers. One – the infinite number of integers or rationals – is countable, the other is not.

This is a pretty mind-blowing proof and it is also fair to say that a minority of mathematicians have issues with Georg Cantor‘s theory of the continuum, but that is for another blog post, perhaps.

So here’s the example:

Let’s start setting out the real numbers that are less that one:

0.00000000000..... 0.10000000000...... 0.20000000000...... 0.30000000000...... 0.40000000000...... 0.50000000000...... 0.60000000000...... 0.70000000000...... 0.80000000000..... 0.90000000000..... 0.01000000000..... 0.02000000000.... ..... 0.11000000000.....

And so on. Now this looks like it might be enumerable (countable) by assigning an integer to element in the list- the first in the list is number 1, the second number 2 and so on. Hopefully you can see that this should, in theory, list all the numbers (except it doesn’t, as we will prove)

Bur let’s us take the number formed by the diagonal. In our case this is:

0.00000000000..... 0.10000000000...... 0.20000000000...... 0.30000000000...... 0.40000000000...... 0.50000000000...... 0.60000000000...... 0.70000000000...... 0.80000000000..... 0.90000000000..... 0.01000000000..... 0.02000000000....

ie 0.0000000000.......

And let’s add 1 to every digit, so we get 0.111111111111....: is this number in the list? We will now show you that it cannot be by checking off the numbers in the list one by one.

First of all, we can eliminate the first number in the list as a match, because its first digit is 0 and our number has 1 as the first digit. Then we can eliminate the second number in the list in the same way (ie it’s second digit does not match). In fact it is clearly the case that we can eliminate number N from the list because it will not match at digit N.

OK, you say, let’s fix that by adding this ‘new’ number to the list:

0.11111111111...... 0.00000000000...... 0.10000000000...... 0.20000000000...... 0.30000000000...... 0.40000000000...... 0.50000000000...... 0.60000000000...... 0.70000000000...... 0.80000000000...... 0.90000000000...... 0.01000000000...... 0.02000000000......

But let’s diagonalise that one:

0.111111111111..... 0.00000000000...... 0.10000000000...... 0.20000000000...... 0.30000000000...... 0.40000000000...... 0.50000000000...... 0.60000000000...... 0.70000000000...... 0.80000000000...... 0.90000000000...... 0.01000000000...... 0.02000000000......

And we get 0.1000000000...... Once again adding 1 to all the digits we get a new number 0.2111111111111.... – which we can show in the same way does not exist in the original list (set)