Racey code can damage your (mental) health

Running the Hackney half marathonI’ve had a tough week. Ran a half marathon (quite badly, but I finished – see the picture) on Sunday, hobbled around York University (blister on my foot) on Monday before returning to London and coping with the added stress of a new job.

All the while the failure of my latest piece of code to work was annoying me too – eating away at me and making me wonder if the whole PhD thing was a bit of Quixotic nonsense or even Panzan stupidity.

Anyone who has chased down a bug for days will know the feeling – the root cause must be out there somewhere, even if you have been through your code dozens of times and can see nothing wrong.

Sitting on a long commute to my place of (temporary) work this morning I realised that my problem could only be down to one thing – a race condition.

I am emulating an LRU/2 type page replacement policy – with two page queues – a “high” and a “low”: pages are initially assigned to low, but can be pushed into high on a second reference. Pages only leave via low (they can be pushed out of high into low if we run out of space in high and so on).

With one thread running there was no problem, but when a second thread came on board the application fell over – and I realised it was because the manipulation of the LRU queues was not atomic.

And, a few dozen code changes later (done on the reverse journey back to London) – and some accounting for C++ template strangeness (it seems that an iterator across a map, means the map is not const – that took me a while to work out) and the problem has gone away and suddenly academic endeavour feels worth it again.