The graphs look similar but the differences are important – this one (the correct one), appears to confirm that Peter Denning‘s findings about the working set model versus LRU still hold good, at least in broad terms – though this still suggests LRU has better performance characteristics than might be expected.
But it’s late now and I am going to bed – perhaps more later.
- The graph is wrong (cartesianproduct.wordpress.com)
Once I published the graph on the previous blog entry I more or less immediately realised it was wrong – it’s not that the curves are wrong: it’s that they are plotted on different scales.
The red line plots lifetime curve using the working set of the process based on pages accessed in a certain time frame – this gives an average working set size () which is plotted along the x axis.
The blue line is the lifetime curve with a maximum working set of a fixed size (ie it is a simple LRU type page replacement simulation). But it is not scaled correctly against the x axis. Something of a waste of 31 hours of wallclock time!
Happily my 12 core box is now online and so I should have a replot done shortly – my best guess is that it may not change too much, things will be interesting if it does.
NB: This graph is wrong - read more here
This shows the ‘lifetime curve’ of Xterm using a time based for determining the working set (ie having a working set of variable size based on recency of access) and a fixed-sized working set.
The y-axis measures the average number of instructions executed between faults.
Late last night, in a state of some despair, I pleaded for computing power help with my MSc project. Today, having thought about it long and hard I found a better algorithm and speeded the whole thing up by a factor of 50.
Like, I am sure, many people previously in my position, my inspiration was the classic Programming Pearls.
This is a Birkbeck set text but it’s also one that I did not read when I should have last year – or rather I just read a small part of it. Luckily I am systematically going through it now and was yesterday reading about how much of a difference a better algorithm can make – not that I did not know this, but it helped provide the determination to find one.
So what did I do. Firstly I realised that there was a specialist implementation of the Java Map interface that, as the Javadoc file explicitly says, is good for LRU caches (essentially what I am seeking to model):
This sets up a last-accessed order and so means that it is no longer necessary to search through the whole Map to find the out-of-date pages. Using an
Iterator and an
EntrySet I only need to get to the first page that is not out-of-date and stop.
When I was checking that worked I noticed that temporal locality meant that in many cases the LRU page was still inside the time frame I was seeking to check – in other words literally billions of checks for outdated pages were taking place needlessly. As pages in the cache cannot get “older” (ie there reference time cannot go backwards), at time the oldest page cannot be any older than it was at – hence if we do not check until we have reached the point here we need to expire pages with time we will not miss any needed evictions.
The result of these two changes is a massive speed up in the code – by a factor of 40 – 50.
- Help! Computing power or better algorithm required (cartesianproduct.wordpress.com)
- Waiter! Where is My Algorithm! (laf.ee)
- MAD ALGOrithms (geomblog.blogspot.com)
- Simple algorithms, illustrated (sathyasays.com)
- When is an algorithm not an algorithm? (jrvarma.wordpress.com)
- Sorting Algorithm Explained with a Hungarian Folk Dance (neatorama.com)
- GCSE Computing Revision – What an Algorithm is, How to write it and How to use it to plan a program (mattg99.wordpress.com)
- Finally, all my PhD code released under GPL! (louisdorard.wordpress.com)
- An Israeli algorithm sheds light on the Bible (physorg.com)