Mo Salah, penalties and artificial intelligence


Last night, as Leicester City played Liverpool, Mo Salah, current top scorer in the English Premiership, was taken down in the penalty box and then stepped up to take the (correctly-awarded) penalty.

Salah hardly ever misses a penalty. It’s about as nailed on as you can get that he will score. Except last night he didn’t. Something – a badly placed kick, a goalkeeper at the top of his game, a momentary lack of concentration, who knows – meant he kicked the ball into the keepers hands (and then failed to score on the rebound).

Well, it happens. Form is temporary, class is permanent as the saying goes. But what struck me was I knew, I just knew, he was going to miss. Something in the few seconds before he struck the ball made me think he wasn’t going to do it.

To be honest, this is quite a common feeling (for me, anyway). But not all the time. David O’Leary’s famous “the nation holds its breath” penalty never bothered me at all. Never in doubt.

But the thing that interests me here is that I am sure I am the only one who feels this way about penalties and furthermore my belief – untested – is that when I feel that someone is going to miss I am right more times than I am wrong.

An unverified search on the internet says that 70% of penalties awarded in the English top flight are scored, so testing for people’s ability to predict a miss ought not to be too hard – (at these odds a random miss predictor would be relatively easy to spot).

If people can do it through some sort of split-second multiply parallel assessment of the factors – the morale of the team, the state of the game, the look of the keeper and, I think above all, the look and feel of the man taking the kick – the question is then whether we could build a machine that did this too.

What would be the point of that? Well, part from the “it’s interesting” factor there is presumably an arbitrage opportunity out there somewhere that would allow those informed by the machine to bet on the outcome. That, of course, would be of no net social benefit to humanity as a whole and probably isn’t something to be encouraged, but still I do think it’s fascinating.

A book recommendation


Lying in the bath this morning – relaxing into the Christmas holiday (despite the raging epidemic of omicron-variant Covid that is currently rampaging through London in an increasingly frightening manner), I had thoughts about several blog posts to write over the break, and this is the first…

It’s a book recommendation: Ian Stewart‘s Concepts of Modern Mathematics. It’s around 50 years since its first edition (and even this Dover edition is 26 years old) but – part from the disconcerting habit of assuming all humans are male – it doesn’t feel like a dated work at all.

In one other way, though, it is a product of its times – the controversy over “new mathematics” – the teaching methods adopted widely across the west in the 60s and 70s in an effort to improve mathematical understanding of students and move beyond rote-teaching. Interestingly, as the wikipedia article demonstrates, its adoption was driven, in part, by the post-Sputnik panic in US science and engineering.

It’s adoption was certainly controversial as Tom Lehrer makes clear here…

Now, having been educated in both Ireland (primary and secondary) and England (secondary) I have actually seen both approaches in operation at first hand – and its certainly the case that the traditional methods gave me a solid (algorithmic in the sense of using a simple and repetitive method) grounding in such things as long division which I think were missing in England, but it’s also not difficult to see why discussions of things like sets, modular arithmetic and matrices are pretty essential for anyone thinking of a career in science and technology.

That said the UK’s School Mathematics Project seems to have avoided what appears to be the excesses of “new math” in the US. I cannot produce any evidence for it but I am tempted to wonder if the UK’s 1980s (and continuing) software boom was in part driven by its innovation in maths teaching (read Francis Spufford’s excellent Backroom Boys for more on that).

Indeed these things are so fundamental to software development that it’s hard to imagine a case for not teaching them – but 50 years ago they were clearly seen as in some way esoteric or even fetishistic.

Professor Stewart’s book concentrates on understanding of the concepts and not rigorous proofs – which makes the book readable and digestible in settings like my bath. I am not going to say it could be mastered by anyone regardless of their current understanding of maths, but its not a traditional textbook and I would suggest it would be a good backgrounder if you had basic (O level/GCSE) maths qualifications and were interested in working as a software developer.

One of the points the author makes in his preface is that the UK has seen its science and technology bases erode as more and more graduates look for careers in law and financial services. I hope that will become the subject of another blog here shortly.

A great Christmas present?


I haven’t quite finished it yet – but I do want to strongly recommend Ananyo Bhattacharya’s “The Man from the Future: The Visionary Life of John von Neumann” – not just because the author was very gracious when I, as some random on the Internet, tweeted him to tell him of a (very small) error in the text, but also because it’s such a stimulating book that – like its subject – covers so many areas from nuclear war to Conway’s Game of Life and back again.

I am not a great one for biographies that go in for long examinations of the subject’s childhood, schooling and dietary preferences, and this book gets the balance right by looking principally at von Neumann’s scientific and engineering work. But it also covers lots of other ground – reintroducing me to this massive tome for instance (which I bought on a whim years ago and did read some of, but didn’t quite get the point of – now I have a better idea).

Years ago, as a(n astro)physics under-graduate I was pretty sniffy about von Neumann – after all he was a big friend of Teller and Teller was evil, right? Well, I haven’t particularly changed my opinion of Teller but I have of von Neumann and this book has certainly accelerated the journey. Not that “Jonnie” is a great hero – in many ways he sounds pretty hard to take – but because his contribution to science, maths and engineering is just so huge. (Thankfully not all of his ideas – such as “pre-emptive” nuclear war were acted on)

Definitely a good book as a Christmas present for curious sixth formers and older.

First year as a software engineer


Today marks the anniversary of me starting work as a software engineer. I love the job – despite some of the real challenges I’ve faced in a radical career change – and I do feel so very lucky to have got it in an exceptionally difficult time.

Some of the changes are about how I see myself – after more than 30 years of working in communications and public policy I (regardless of what others thought about me or even whether I was right or wrong) was generally very confident in my own judgment and ideas. I’d been around the track – more than once – and whether you liked it or not I had a view I’d generally express. Now I am the start-of-career, not-long-out-of-university beginner. It can be daunting sometimes and I get things wrong, though my colleagues are generally happy to help (though everyone working remotely does sometimes make that a little harder).

Well, I’m not quite new to everything – I know how corporate things work and while I am nobody’s manager I do know what that is about too (or at least I think I do).

Secondly, I am very much working in an engineering environment and not a computer science one. The differences are subtle and I cannot quite articulate them, but they are certainly real. “Scientists” (whether computing scientists – applied mathematicians really – or hard scientists) and engineers do tend to look at each other a little warily, and before I’d always been on the “other” side of this. But I am getting used to this too.

Above all it’s great to work somewhere where every day I am expected to think and apply that thought to solve novel and interesting problems.

The problem with parallelisation


Just over 15 years ago a big change in general computing occurred – computer hardware more or less stopped getting faster.

The previous three decades of ever-faster computer hardware had been supported by the ability to etch ever-smaller transistors on silicon. This, together with an ability to use bigger silicon wafers leads to what is known as Moore’s Law – a doubling of the number of transistors on a chip (latterly) every 24 months or so.

But as each transistor used power and as faster computers use more power at a given voltage, the magic of ever-faster computing was also dependent on Dennard Scaling – namely that as transistors got smaller, their energy use also fell. But Dennard Scaling more or less ground to a halt at the start of the century and by 2003/4 computing manufacturers had to come to terms with the fact that while chips could still pack more transistors in, they couldn’t keep getting faster.

The implications of this have been profound. First of all the previously highly buoyant desktop computing market more or less fell off a cliff. If computers no longer doubled in speed every 24 – 36 months there was no need to keep buying new ones. (Smaller, lighter computers did become more viable and cheaper though – this is why laptops were for show-offs in the 20th century but are commonplace today.)

Instead of building ever-faster chips, the extra transistors appear as additional processors (cores) on the same chip: the idea being that if we can break computing tasks down into parallel sub-tasks then we can still get better performance from our systems if they use multiple cores running at slower speeds instead of one big core running at a super-fast speed. Of course, in any case, we can’t actually get that super-fast speed as the chip uses too much energy, gets too hot and then is in danger of ceasing to function properly at all.

But concurrency is hard – it’s very difficult to write efficient parallel code, especially because of what is known as Amdahl’s Law – which essentially says it doesn’t matter how much parallel code you have if you keep having to execute significant bits on a single processor and (and this is the killer for multi-core systems) that single processor is now slower because of your multi-core design.

Parallel code needed for multi-core speedup

For my PhD I made a projection (using a formula found in this paper) for just how parallel code had to be to get better performance (see above, where f is the fraction of code that is parallel) and the results are sobering. For code that is 99.9% parallel then using 1000 processors (each of which is about 250 times slower than the one faster chip they collectively replace) we can double the speed, more or less. But if the code was only 99% parallel, far from doubling the effective speed of execution, we end up more than halving it. And 99% parallel code is very, very difficult to do – imagine running 1000 tasks in parallel but only spending 1% of all time co-ordinating the processes.

The difficulty is compounded by the fact that even if the cores have multiplied then the other systems on the computer have not – generally speaking we still have one memory and one storage device and so cores have to queue to access these (this is, essentially, what my PhD was about).

When I started the PhD as a part-time student in 2012 the firm expectation in industry was that we would by now be well into the era of 1024-core chips. That simply hasn’t happened because, at least in part, there is no firm commercial reason for it: even if processors with that many cores could be mass-produced (and they probably could), they would actually be slower in the real world than processors with many smaller numbers of cores in all but some specialised domains.

(For those that are interested I expect the PhD thesis to be online shortly, I’ll post a link when it is.)

Addendum: A few people have questioned where the figure for “250 times slower” comes from – someone even accused me of making it up! In the referenced paper there is a formula for the Pareto frontier for core performance – in other words the limit of maximal efficiency. This is where the factor for individual core slowdown comes from – in fact the 250 figure relates to 1024 cores but the 1000 core figure would be similar. Simple maths shows that the maximum speed this assembly could manage is 4 times greater than a single high-speed core but that depends on having 100% parallel code.

It’s also worth noting that this formula is based on a particular node of chip manufacture – the so-called 45nm (nanometre) node (the node number used to relate to the typical size of components etched on silicon but is now just a generic term for a particular manufacturing/fabrication process). Leading edge chips are now down to the 7nm node so higher degrees of efficiency are (probably) possible as a result: but I haven’t got a formula to calculate those and the basic principle – that we need highly parallel code to get the most from many-core designs – doesn’t alter.

I love binary trees

NoC with binary memory tree

A few years ago I was very proud of myself for managing to translate the original Pascal of the Reingold-Tilford algorithm into C/C++ and every since then I have had a fascination for drawing binary trees.

NoC with binary memory tree

 

The one drawn here was actually done “by hand” in Metapost – I realised only as I finished it that I could have or maybe should used an algorithmic approach and not a heuristic (“does it look ok?”), though Metapost is a pretty inflexible programming tool (though as this book shows can produce some beautiful results) and I reckon a decent tree drawing algorithm in that ought to be worth at least an MSc.

A fractal in the real world


Of course, real world fractals do exist, but it is interesting to appear to have “discovered” one, or at least a pattern which certainly shows many fractal-like properties.

I am plotting the relative efficiency of different page replacement algorithms – and here I am looking at Belady’s OPT (or MIN), or as it is sometimes called, the clairvoyant page replacement algorithm.

Here I am measuring the proportion of time a page stands idle (including its loading time). So a page with a ratio of 1.0 is loaded but never used, a page with a ratio of 0.999 is used one-thousandth of the time it is in the system and a page with a ratio of 0.99 is used one-hundredth of the time it’s in the system and so on.

So, here’s the “full” plot for my data:

Full OPT plot

Not much to see – except most pages are idle most of the time: so we go closer:

90%

And closer…

0.999%

And closer…

0.9999%

That’s about the limit of the precision of the data, but it looks pretty fractal like to me…

(Should add, other page replacement policies studied don’t appear to show this fractal-like pattern, exhibiting only one peak/modal frequency.)

Procrastination time: what does this mean?


I am supposed to be writing some more stuff for my PhD thesis, but this feels like a more legitimate way to procrastinate than others…

I have plotted the performance of a model of a multi-core processor system and, because it was something to do, applied a Fourier Transform to the data (the black plots in first chart):

2k LRU performance

So, my question is, what does the second chart mean? If it means anything of course.

I get that the real component is plotted on the x-axis and the imaginary component on the y-axis, but is there any information conveyed in the chart?

A very first draft…

Real time devices

Our proposal here is to meet this need to use a memory space much greater than the locally available low-latency storage head on. The problems faced by programmers of today’s many- and multi-core devices are, in a sense, far from new: in the 1960s an earlier generation of programmers, experimenting with virtual memory and time-sharing computers, and using programs generated by the first generation of high-level languages were struck down by the phenomenon of thrashing, where computing time was lost to excessive memory management [23]. The solutions we propose here draw on the lessons learned at that time, but significantly adapt the methods to account for the realities of using multicore devices with very limited local storage.

We propose to enable virtual memory [24] for such systems, using a novel form of paging which, instead of transferring whole pages across the memory hierarchy, concentrates on a minimal transfer of partial pages [50]. Here hard faults still impose a significant penalty of lost cycles but the cost of paging in and out of memory is amortised across multiple memory reads and writes. This avoids unnecessary transfer and limits congestion in the memory interconnect, which we find to be a significant determinant of program performance.

One of the lessons of the 1960s was that maintaining a program’s working set in low latency memory was the most effective way of limiting paging and thrashing [22]. Much research at that time and into the 1970s and later concentrated on effective ways to maintain the working set of pages in fast memory and innovation continued into the 1990s (and beyond), for instance with the 2Q algorithm in Linux [38]. Here, though, we establish that better performance in embedded and real-time manycore systems is unlikely to come from a better page replacement algorithm but from a reorientation towards better management of the internals of the pages.

General computing page replacement algorithms are generally robust [49] and with the decreasing cost of memory, much recent research has concentrated on using large page sizes, not least to minimise TLB misses [9], we return to earlier research findings which emphasise the efficiency of small pages on small systems and we show that combining this with a FIFO page replacement policy may, perhaps counter-intuitively, deliver more time predictable or even faster performance than attempts to use some version of a least-recently used policy, though we also show that choice of page replacement algorithm is likely to be strongly related to the variation in working set size over a program’s lifecycle.

Concerns about power have always been important for embedded devices, but changing characteristics of hardware mean new issues such as “dark silicon” [28] will be important as we move deeper into the coming manycore era. We demonstrate here that constraints in power may be, to some extent, mitigated by a loosening of bottlenecks elsewhere in a realtime system, particularly in the memory interconnect: in a system where power is no constraint then congestion here dominates performance, where power concerns limit processor activities they may also lessen congestion.

Further we demonstrate results that some frequency scaling in manycore systems may not limit performance if it contributes to lessening congestion in the memory interconnect: results that suggest that the dark silicon issue may not be as big a barrier to performance in real-world applications as might be expected.

Reviving this blog … with a question about binary trees


Work changes and a determination to actually finish my PhD mean I really should make a bit of an effort here and so I will.

Here is a puzzle that has been bothering me about binary trees which has come from my PhD research…

In that research I am investigating how to implement virtual memory on a many-core Network-on-Chip (NoC) system. Essentially I have been building and running emulators.

The core of the emulation is the “memory tree” – the connection between the cores of the NoC and the global memory controller. The “Bluetree” memory tree is a binary tree arrangement, with the root node (level 0) connecting to the memory controller and the leaves at the NoC tiles (with each tile having a single processor).

At every non-leaf level of the tree there are multiplexors (mux) with a buffer that can hold a single memory packet – so if two requests for global memory arrive at once and the buffer is free there needs to be an arbitration process to decide which packet gets to occupy the buffer and which has to wait.

We have 128 leaf nodes – as in this tree…

Binary tree with 128 leaf nodes

With this many leaf nodes, the question of the arbitration policy of the muxes is important. A naïve approach would be to say, if two packets contest for the buffer, let the packet on the left win the first time and then pass the packet on the right: after all this guarantees no starvation.

But it is actually a hopeless policy … the leftmost leaf node (shall number from zero) faces no opposition, the rightmost (number 127) loses out to every other node.

But number of leaf node alone is not a sufficient way of describing the relative number of blocks – for instance leaf node 16 appears to be in a better place than leaf node 15 – as going up the tree leaf node 15 can be blocked at level 6 (the root node is at level 0), at level 5, level 4, level 3 and level 2: while leaf node 16 is only blocked at level 1.

Practical testing of this naïvely fair approach in the emulator gives results like the following… as can be seen the right hand part of the tree performs appallingly but even there lots of variation can be seen:

some test results

My question is this: what is the correct (i.e., correct mathematically) way of describing the order of the leaf nodes?