The five minute rule etc., redux


So I made a bit of a mess with this the the first time round and when it ended up on Slashdot (I had assumed that wasn’t happening as the delay between positing and it appearing was a few days), I was left rather embarrassed.

With a thanks to all those who commented and pointed out the errors – here’s my attempt to do it properly. I haven’t agreed with all the comments – eg it was pointed out RAID wasn’t an option in the mid 80s so shouldn’t be part of a comparison now, but I think this is about real world choices, so RAID would probably feature, etc…

This paper – The 5 Minute Rule for Trading Memory for Disc Accesses and the 5 Byte Rule for Trading Memory for CPU Time – looked at the economics of memory and disk space on big iron database systems in the early/mid 1980s and concluded that the financial trade-off between disk space (then costing about $20000 for 540MB disk (about 3.7 cents or there abouts per KB) and (volatile) memory (about $5 per kilobyte) favoured having enough memory to keep data you need to access around once every five minutes in memory.

It then compared the cost of computing power (which was estimated to cost about $50,000 per MIPS) and memory – eg if you compressed data to save on memory space you will have to use additional computing power to access the data. Here the trade off is calculated to be about 5 bytes of memory per instruction per second.

What do these comparisons look like now? The original paper explicitly ruled out applying these sort of comparisons to PCs – citing limited flexibility in system design options and different economics. But we won’t be so cautious.

The five minute rule updated

Let us consider a case with 2TB SSD disks. These cost about £500 (and probably about $500, we are approximating) and let’s say we are going with a RAID 5 arrangement – so actually we need 4 ‘disks per disk’ (£2000) and the cost is then 0.0001 penny per kilobyte (about 4 orders of magnitude less than 35 years ago).

As discussed above I am using the RAID figure even though RAID wasn’t a practical option in 1985 because I think this is about real world choices not theoretical limitations.

And memory – for simplicity we are going to say 128GB of DRAM costs us £1000 (an over-estimate but fine for this sort of calculation). That means memory costs about 0.001p per KB – a fall of around 7 orders of magnitude.

Using the same regimen as the original paper – but considering 4KB pages – and assuming that the disk system supports 10000 accesses per second then the cost for a disk is about 5p/a/s (about 5 – 6 orders of magnitude less than in 1985). We ignore the costs of supporting a disk controller here but conceivably we might want to add another few pence to that figure.

If we then think that making one 4KB page resident in memory saves 1 access per second, the disk cost saved is 5p at the cost of 0.004p. Saving 0.1 accesses per second saves 0.5p at the cost of 0.004p and the break even point is roughly 0.0008 accesses per second – or alternatively we need to hold pages in memory for 1/0.0008 seconds – 1250 seconds or about 20 minutes.

Alternatively this means caching systems should aim to hold items that are accessed every 20 minutes or so.

And the trade off between computing power and memory…

As mentioned above, back in 1985, computing power was estimated to cost about $50,000 per Million Instructions Per Second (MIPS). These days single core designs are essentially obsolete and so it’s harder to put a price on MIPS – good parallel software will drive much better performance from a set of 3GHz cores than hoping a single core’s 5GHz burst speed will get you there. But, as we are making estimates here we will opt for 3000 MIPS costing you £500 and so a single MIPS costing 17p, and a single instruction per second costing (again approximating) 0.00002 pence. (Contrast this with a low-end microcontroller which might give you an IPS for about 0.00003p or a bit less).

Computing power has thus become about 5 orders of magnitude cheaper – but as we note above memory prices have fallen at an even faster rate.

Now an instruction costs 0.00002p and a byte costs 0.0000001p, so we need to save about 200 bytes to make an additional instruction worthwhile – meaning that cost-efficient data compression of easily accessible memory is hard to do.

How good was my US elections model?


The honest answer, I think – see the picture – is “not very”. But I offer various excuses below.

As a reminder the grays indicate “too close to call” – which means either candidate had a 40 – 60% chance of winning, the lightest blues or reds (pink) is 60 – 75% chance of winning, the mid-shade is 75 – 90% and the deepest shade is 90% and over.

To generate the above picture I used Biden 51.0% and Trump 47.3% – the count is still happening in a few places and as a result Biden’s vote share might climb by perhaps another 0.1% but that won’t make a significant difference.

There is also a bit of a cheat here because the model was designed to generate a projection from an opinion poll rating and not an actual result – so it injects uncertainty based on the idea of a standard error in even the best polls (as always, I recommend Statistics Without Tears for anyone who wants to get to grips with some of the core concepts here). Without that additional fuzziness I suspect I’d be even further out.

But despite all this I still think the basic idea works – the polls were generally pretty poor in the election and that meant a lot of projections I posted were out by a lot – but that isn’t the same as saying the model idea – that we can use a national poll to make estimates about the likely outcome in the election state-by-state is flawed.

One thing that was wrong was the (small – no more than 0.5%) adjustments in the model for the trend in state voting. I already knew it was wrong in Georgia – since 2016 the trend there has been strongly pro-Democratic and the model didn’t take that into account. But it also assumed that the pro-Trump trend seen in the mid-west (and a couple of other places) in 2016 would continue – i.e. those states would become even more “Trumpy”. If instead we assume a reversion to the mean and reverse a few signs on the trends, and nothing else, you get this for the projected outcome:

There is no movement in Arizona or North Carolina as I marked them as trending Democrat already, while Florida was trending Republican and also remains unchanged.

This looks a little more like what we actually saw (in fact the national projection is about a 51% chance of a Biden win) – though I admit it’s still not great.

In my defence, I knocked this up in not many lines of R in an evening (with a bit of extra time to smooth out some bumps later). If national polls – generally – were less awful than – generally – state polls proved to be, then this approach still might be of use.

Updating the five minute and the five byte rules


It’s fair to say I made quite a lot of mistakes in this article – so don’t read what follows (though the comments are of interest) – but try this fixed version.

(So I had quite a lot of errors in this blog. I hope I’ve fixed them now and apologies for the mistakes.)

This paper – The 5 Minute Rule for Trading Memory for Disc Accesses and the 5 Byte Rule for Trading Memory for CPU Time – looked at the economics of memory and disk space on big iron database systems in the early/mid 1980s and concluded that the financial trade-off between disk space (then costing about $20000 per 540MB or 3.7 cents or there abouts per KB) and (volatile) memory (about $5 per kilobyte) favoured having enough memory to keep data you need to access around once every five minutes in memory.

It then compared the cost of computing power (which was estimated to cost about $50,000 per MIPS) and memory – eg if you compressed data to save on memory space you will have to use additional computing power to access the data. Here the trade off is calculated to be about 5 bytes of memory per instruction per second.

The trade off between memory and disk

Now the paper explicitly ruled out applying these sort of comparisons to PCs – citing limited flexibility in system design options and different economics. But we won’t be so cautious.

So what are the trade offs today? Well let us consider a case with 2TB SSD disks. These cost about £500 (and probably about $500, we are approximating) and let’s say we are going with a RAID 5 arrangement – so actually we need 4 ‘disks per disk’ and the cost is then 0.0001 penny per kilobyte (about 4 orders of magnitude less than 35 years ago).

And memory – for simplicity we are going to say 128GB of DRAM costs us £1000 (an over-estimate but fine for this sort of calculation). That means memory costs about 0.001p per KB – so the cost of both media has fallen hugely.

To make the calculation we need to look at access times. We’ll assume that we can access (read) a 4KB page on our SSD in 100 microseconds, so can handle 10,000 such reads a second. Four KB of memory costs us 0.004p and our disk costs us 20p/a/s. So following the logic in the original paper – if we stored a page in memory it has cost us 0.004p but saved 20p in a second, so if we save a page every 5000 seconds we have cost 20p – the break even point.

Alternatively we can think of this as meaning our caching or virtual memory schemes should target those pages that are accessed every 5000 seconds – about an hour and a half! The problem with that is that it implies a memory system of roughly 16TB to be truly efficient.

The tradeoff between memory and computing power

As mentioned above, back in 1985, computing power was estimated to cost about $50,000 per Million Instructions Per Second (MIPS). These days single core designs are essentially obsolete and so it’s harder to put a price on MIPS – good parallel software will drive much better performance from a set of 3GHz cores than hoping a single core’s 5GHz burst speed will get you there. But, as we are making estimates here we will opt for 3000 MIPS costing you £500 and so a single MIPS costing £0.17, and a single instruction per second costing (again approximating) 0.00002 pence.

Toughly speaking we have a byte of memory costing 0.000001p – maybe 20 times cheaper than an instruction. This suggests that there isn’t much to be gained in data compression at all.

But there are some big assumptions here – again that we have all the memory we need and that there is no instruction cost in having lots of memory.

What did happen to Comp Shop


Nine years after asking “Whatever happened to Comp Shop?”, I now have the answer.

CompShop advert page 2

For those who don’t know, Comp Shop was the ground-breaking computer shop, first in New Barnet and then later also on Tottenham Court Road, that played a major part in the early days of UK microcomputing through its production and sale of the “UK 101” – which was probably the most popular hobbyist computer in Britain before Sinclair came along with the ZX80 and ZX81.

My brother and I used to spend a lot (too much) after school time in the New Barnet shop where William (Bill) Wood, the ever-present chief engineer was seemingly happy to let us look at – and even use – the Apple IIs, Commodores and other machines that were much more powerful than our own ZX80. (We did occasionally buy stuff too, but very much at the book and cassette end of the market.)

Comp Shop was a retailer run by people who knew about computers – the difference between it and some subsequent retailers is enormous, but it seemed to disappear just as the market was going into overdrive.

William has now been in touch to say that he and Chris Cary – who had founded the business but more recently left the day-to-day running (successfully) in William’s hands (Cary had his fingers in many pies and as ‘Spangles Muldoon’ remains a legendary figure in radio) – had a catastrophic falling out (repaired in later years) that saw the business closed more or less immediately.

(Advert taken from here)