Yesterday I set a new personal best (PB) for the Finsbury Park 5k Parkrun: 23 minutes and 50 seconds – 5 seconds faster than my previous PB, set on 22 February and a full 170 seconds faster than my first (completed – there was a previous failure!) run on 29 June last year.
To beat Mo Farrah in a re-run Olympics I need to shave another 10 minutes and 9 seconds – 609 seconds – off my time. Easy – at a linear rate of improvement I will manage that in another 2 years and 10 months. Rio here I come!
But why stop there: you will have noticed I have also taken 3.4 seconds off my 100 metre time. To be the fastest man in the world and beat all comers with 9.5 seconds for 100 metres I need just wait another 5 and a bit years.
Of course, I’m never even going to come close. My rate of improvement is slowing rapidly. A full 57 seconds of that 170 seconds – in other words more than one-third of it – came in the week between 29 June and 6 July last year.
Even the current rate of improvement – 5 seconds over 7 seven weeks – probably won’t be sustained for very much longer. And even at that rate I wouldn’t be beating Mo for another 16.4 years – by which time I’ll be 65.
Interestingly, the two series are very close to isomorphic (i.e., they have essentially the same shape if one scales the x-plot appropriately).
This is based on running a short piece of code to decode 32 frames of video – the memory model used was 4k pages with a pooled 512k on chip (each chip can access any part of the pool at the cost of 1 tick) and unlimited memory 100 ticks away per 128 bits of read-in (so for a 4k page it takes 25,600 ticks to fault the page in).
This is a bit of a crude simplification of the memory model seen on, say, a Tilera TILE64 system, but it’s close enough (though this simulation had 16 cores).
(Before any of my “political” friends think I have obviously suffered a serious blow to the head, I am talking about his theories on grammar and not his idiotic politics…)
In the late 1950s Noam Chomsky proposed that we have a natural capacity to process grammar and thus to use language – in essence that our brain is hard-wired to use language.
It was, and is, a controversial theory (though Chomsky would not agree), but this week new evidence has been published to support it – and, as outlined in the New Scientist, you can even conduct a thought experiment on yourself to test it.
Writing in the Proceedings of the National Academy of Sciences (the US National Academy that is), Jennifer Culbertson and David Adger consider whether language learners pick up language patterns by observing statistical patterns from existing speakers/users or – as the Chomskian theory would suggest – apply some form of “hard wired” rule to process grammar.
To do this they presented subjects (English speakers) with a “new” limited language based on common words in English. The subjects were then asked to judge whether a new phrase in this “language” – made by combining elements of the limited language they had already seen – would be correct in one of two forms. If they picked one form then they would likely be using some form of statistical inference – picking a form that looked closest to the forms they had already seen – if they picked another they were likely using an internal grammar machine in their brains.
And this is where you can test yourself … (shamelsssly nicked from the New Scientist as this example does not appear to be in the article itself):
Here are two phrases in the new language:
So which of the following phrases is correct in this language:
shoes two blue
shoes blue two
If, as I did, you picked “shoes blue two” and not “shoes two blue” then you are favouring a semantic hierarchy and not a frequency based approach – in English two usually precedes blue, but blue is a stronger modifier of the noun than two.
In fact people chose the semantic hierarchy about 75% of the time – strongly suggesting that we do have a internal grammar engine running inside our heads.
(Chomsky himself appears to be dismissive of the study, despite it appearing confirm his work – “like adding a toothpick to mountain”. Tells you quite a lot about him, I think.)
What are the practical implications? I think it points to a limit to the effectiveness of things like big data based machine translation, if all that relies on is statistical inference. Inside a decade big data has made machine translation much more practical than the previous 50 years of AI research, but the quest for a way to compute grammar is still going to matter.
This is not a story of a great debugging triumph – but it is one that points to a great debugging truth – study of the bug before you start to pore over your code is more likely to get you to a solution faster.
In OPT the idea is that, when we need to make room in main memory for a new page, we select for removal the page with the longest “resuse distance” – in other words, the one we will have to wait the longest (perhaps forever) before needing to use again. This algorithm is sometimes called the “clairvoyant” algorithm because it requires foreknowledge of which memory page will get used when. That does not happen very often in general use computing, but can often be the case in embedded computing, where the code does exactly the same thing over and over again.
In my case I am using a memory trace from a transcoding of 32 frames of video – a small (in terms of time on a real computing device) example of the sort of parallel task you might see in embedded devices. In the real world this runs for a few seconds – but it also generates 220GB of XML trace records spread across 18 threads.
With a single thread it’s easy to work out the reuse distance – you just look at how long it will be before a page gets referenced: you could even do this statically ahead of runtime and just the result up if you wanted.
That is not true in multithreaded code – one thread might run fast or low (eg while waiting for IO) and so the only way to do it is to measure the reuse distances for each page and for every thread:
For each thread calculate the minimum reuse distance of each page;
Then pick the page with the maximum minimum reuse distance across all threads.
I wrote some code that seemed to do this and on my testing subset of the 220GB XML it seemed to deliver good results. But whenever I ran it against the full trace it started brightly but then performance – by which I mean how fast the simulator ran through the traces in terms of the synthetic ticks generated by the simulator, or bluntly the simulated performance – just seemed to get worse and worse.
In fact the longer a simulated thread seemed to be running, the worse its performance got and the “fastest” thread was always the most recently spawned one, and the more threads that ran the worse this problem got.
Now, the combination of severely limited memory (in this case we were simulating a 16 core NoC with 32Kb local memory per core, which is pooled into one flat 512Kb space), performance can go downhill quite badly as the thread count climbs – though this fall off was catastrophic and it was plain that OPT was going to be worse than “least recently used” (LRU) – and that just cannot be correct! I have not sat down to write a mathematical proof of that but instinctively I know it to be true…
Reading through the code did not draw my attention to any obvious flaws, so I had to sit down and think about what the bug showed me – it worked well on short runs, the most recent thread seemed to do well and the recent threads in general did better than the longer established threads.
Even writing this down now makes it seem obvious – my code was in some way biased towards more recently created threads. And so instead searching through all the code looking for errors, I could instead home in on those parts of code that scanned through each thread.
I found such an error quite quickly but testing again showed that while the problem was diminished, it was not by much – I still had not found what was really to blame.
Another scan through the key sections of the code revealed the real error: when a thread tried to page in memory it only examined the reuse distances of itself and those threads created after it.
Thread data was stored in a linked list, but instead of starting the scan at the head of the list, the scan began at the pointer to the current thread. The result was that the most recent thread had a “perfect” OPT experience – on every page in its reuse distances were fully considered, but at the other extreme the first thread’s reuse distances were only considered when it came to page in memory – so the pages it used but were not used by any other thread were fair game – they appeared to have an infinite reuse distance and so were almost certainly the ones chosen for replacement more or less every time.
Fixing the code so that the scan began with the head of the linked list and not just the local pointer fixed the problem and the OPT simulator is now running – my guess is that it is going to show OPT to be two to three times more efficient than LRU.
The fundamental difficulty with quantum mechanics is that it says we cannot know, with total accuracy, the position and momentum of a particle. This “uncertainty” is what creates the “spooky interaction at a distance” – because if we measure the momentum of one paired particle then uncertainty and energy conservation laws “appear” to make the other particle assume a certain state instantaneously.
In the “Copenhagen interpretation” we are essentially asked to accept that this is due to an instantaneous “collapse” of the physical laws we have been using to this point. It’s as if our quantum rules are just a window on to a “real” physical world and that our poking shakes up what is going on behind the scenes in ways we cannot hope to understand.
That’s not very convincing, though (even if it “works”).
So, what are the alternatives?
Valentini is reviving an idea of Louis DeBroglie that was rejected in favour of the “Copenhagen interpretation”: namely that our paired particles remain linked by a “pilot wave” that communicates the state change instantly.
That, though, appears to offend against the physics of the world of general relativity – we are conditioned to think such instant communication is impossible because our physics tells us that we need infinite energy to move a massive body at the speed of light: hence making that an unbreakable speed limit.
And then there is the “many worlds” interpretation – namely that all that might happen does happen and so there are an infinite number of those paired particles and “our universe” is just one of an infinite number.
None of them really seem that satisfactory an explanation.
We chose these particular case studies for several reasons. We wanted examples of totally different art forms and media; we wanted a wide historical and cultural reach; we wanted artefacts that have already been subject to extensive debate (part of the interest is in the nature of those debates); and we wanted examples that might usefully reveal different aspects of the two principal kinds of values in our study.
We are planning three intensive workshops on these case studies bringing together experts from different perspectives and disciplines: archaeologists and palaeontologists for the cave paintings, Shakespeare scholars and literary theorists for the Sonnets, film theorists and critics for the film. We are delighted, for example, that Jill Cook, who curated the highly successful exhibition on “Ice Age Art: Arrival of the Modern Mind” at the British Museum, will contribute to the Chauvet workshop. Throughout there will be an input also from aesthetics and philosophy of art. The interdisciplinary nature of the enquiry is crucial to it.
The day long seminar on Blade Runner is on 11 April. Plus, there’s a showing: it’s not often you get to see the classics back in a theatre, so that might be of interest