In my PhD world a year’s worth of software experimentation has proved what we all knew already … that systems using traditional memory models struggle in the Network-on-Chip environment and so I am now trying something slightly different.
My “model” (it’s all in software) is of a 16 core system, with each core having a small amount of on-chip memory (32k), which are combined together to form a flat memory space. Memory in this space can be accessed quickly, memory outside it, in the next level up in the hierarchy, is roughly 100 times further away.
Using any form of traditional paging model (including Belady’s optimal page replacement algorithm) this system starts to thrash on even moderate loads – the cost of moving pages in and out of the local memory determines performance and so there is no benefit from adding additional processors (in fact it just slows the individual processors down).
Such an outcome makes any promise of improved performance from parallelism void – it does not really matter how efficiently you have parallelised the code (some corner cases excepted – eg if all chips were accessing the same memory at the same time), you are trapped by a memory I/O bound.
So now I want to look at alternatives beyond the usual 4k (or 2k) paging – but I have been struggling all week to get the locking semantics of my code right. Concurrency is hard.
The one thing that debugging parallel code and locks teaches you again and again is never to assume that some event will be so rare you don’t need to bother about it: because when you are executing millions of instructions a second, even rare events tend to happen.
It has also taught me to check return values – code that will “always” work in a single threaded environment may actually turn out to be quite a tricky customer when running in parallel with other instances of itself or when it is accessing shared memory.
But, finally, the main lesson this week has been about going atomic.
I have a tendency to think – if I can release that lock for a few lines of code that might improve overall performance and I can just lock it again a little later. Beware of that thought.
If you need to make a series of actions atomic you need to hold the same lock across them all – releasing it for even a few lines breaks atomicity and will quite likely break your code.