A plot that helps explain why the desktop computer industry is in trouble

Desktop computer sales are falling.

Desktop (and laptop) machines are just not improving in speed at the same rate as they have historically, and so sales are on the slide – there is no need to buy a faster machine when the one you have already is not much slower than an expensive replacement.

(I am writing this on a six year old laptop – it is, relatively speaking, quite slow, but still not so slow that I have been bothered to buy a faster machine.)

This lack of increased speed up is because, although “Moore’s Law” of doubling the number of transistors on a piece of silicon still applies, the related effect of “Dennard Scaling” – which kept the power needed by those transistors under control – has broken down.

The solution for this ought to be to use more but less energy-hungry, chips (CPUs). If we can get our software to efficiently use an array of not-so-fast chips rather than a single fast chip then we can still see faster computers. This is why computers (an increasingly mobile phones, because they too will soon be in trouble) are now sold as “multicore” devices.

But this has its limits. Unless we can find better ways for these chips to co-operate in software then current hardware models limit the ability to speed machines up.

And that, sort of, is where my chart comes in.

slowWhat the chart shows is that, using current software designs, adding more processors to tackling a problem can actually decrease the speed at which the problem is processed, and certainly shows no sign of speeding it up.

The chart – not to be taken as gospel as the data is very rough and ready and not properly normalized – shows how many instructions (in a simulator) an array of processors will execute per million or so processing cycles – essentially 10 means100,000), 20 means 200,000 and so on.

But the piece of information the chart does not show is that as we move from left to right the number of processors being applied to the problem is being increased and yet the number of instructions being executed is either more or less constant or even – as the first few processors are added – falling. After about 29,000,000,000 ticks the number of processors being applied begins to fall (from a peak of 16 to 11 at the far right) but again the number of instructions being processed is close to constant on average (the red line).

This isn’t because the code is hopelessly inefficient (for instance the processors are added or removed as new threads of execution come on-stream or finish) but because they compete with each other to access the available memory. This is an example of what computer scientists call “thrashing” – as the cores force each others’ preferred pieces of memory out of the system in an effort execute their own code (and this takes time), only to have the same thing done to them. Processors spend far more time waiting for their code and data to be fetched into memory than they do actually executing it.

Unless and until this fundamental problem is fixed or bypassed then the desktop computing industry will be in recession.


A supercomputer on every desktop? Not yet, anyway

My PhD is about operating systems on Network-on-Chip (NoC) computers. I have not actually done any research yet, so don’t expect anything here – but I have been playing with some existing data and I think it gives some interesting results.

NoCs are part of the semiconductor industry’s response to the failure of “Dennard scaling”: Moore‘s law says we can double the number of transistors on a chip every 18 – 24 months and we are still able to do that. Dennard scaling was the thing that made that super useful – because it meant the power requirements for the processors stayed constant even as they acquired more transistors. Now it has broken down, building faster chips becomes that much more difficult because, bluntly, they would burn up unless we limited the power.

NoCs aim to get round this by replacing one fast and power hungry processor on a single chip with several less powerful processors on the same chip – the idea being if we can attack the problem with several slower processors we can get the job done more quickly than if we used just one faster processor.

But there’s a catch, a big one, as captured in Seymour Cray‘s question:

would you rather plough a field with two strong bulls or a thousand chickens?

NoCs do not replace one fast chip with a dozen not quite as fast chips – they parcel out the power eaten by that one chip to the whole network on the chip – it’s not quite as bad as dividing the computing power by the number of chips (for that was the case there would be no advantage at all), but it is not fantastically above that.

Using work published by Hadi Esmaeilzadeh from the University of Washington along with others from the University of Wisconsin – Madison, the University of Texas at Austin and Microsoft Research, my projection is that, if we took one of today’s fast chips and parcelled up the power, then we would see computing power decline like this:

  • One processor: 100% computing power
  • Two processors: 65% computing power each
  • Four processors: 38% computing power each
  • Eight processors: 21% computing power each
  • Sixteen processors: 11% computing power each
  • Thirty-two processors: 6% computing power each
  • Sixty-four processors: 3% computing power each

Now, 64 x 3 = 192, so that might look like quite a good deal – a 92% speed up. But it is not that simple because some part of the code, even if just the bit that starts and ends your process, can only run on one processor even if all the rest can be split into 64 equal parts. And the pieces that will only run on one processor are now 33 times slower than they were before. The key balance is this: how much code can you run at nearly twice the speed (92% speed up) versus how much do you have to run at 33 times slower than before?

The answer is that you have to run a lot of code in the fast zone before you really see a big advantage.

45nm NoC modelled

As the graph suggests you would need to have about 99.9% of your code capable of running in parallel before you saw a guaranteed speedup with 64 processors in your NoC. Plenty of such code exists – such as in image handling and so on – but you are not likely to be running too much of it on your desktop computer at any given time (except perhaps when you are running a streaming video application) and the big disadvantage is that when you are not running the parallel code you are stuck with the 3% performance.

(Actually, it’s not quite as simple as that, as you may have a faster processor equipped to do the single thread of execution stuff, but then your computer starts to become more expensive.)

In the future chips will get faster – but maybe not that much faster. In a decade’s time they could be between 400% and 34% faster than they are today, depending on whether you are optimistic or pessimistic (realistic?) about processor technologies. That will help, but still not enough to put this technology in your desktop – as opposed to your games console or your television set or maybe a specialised processor in a top of the range machine.

So don’t expect your personal supercomputer too soon.

The end of Dennard scaling

Lots of people have heard of Moore’s law and probably think it means computers get twice as fast every 18 months or two years or so.

Replica-of-first-transistor (Photo credit: Revolweb)

Of course, that’s not quite what it really says – which is that we can put twice as many transistors on a chip every two years or so. And for most of the last fifty years that was pretty much equivalent to saying that computers did get twice as fast.

(And, in reality, “Moore’s law” isn’t really a law at all – it began as an observation on the way that commercial pressures were shaping the semi-conductor market, with chip/solid state electronics manufacturers competing by making their offerings more powerful. But it is also a mistake to state, as sometimes happens, that the “law” says we can pack twice as many transistors into a given space every two years – because that is not true either. In fact in the early 70s transistor density was doubling every three years or so, but chip sizes were also increasing, so transistor numbers doubled every 18 months, but by the 1990s the pace of chip growth slowed and so the time taken to double transistor numbers increased to 24 months.)

It’s now nearly a decade since what has been called “the breakdown of Moore’s law” and the switch to multicore processors instead of ever faster single chips. But again, this is wrong. Moore’s law has not broken down at all – transistor numbers are continuing to increase. What has happened is that it is no longer possible to keep running these transistors at ever faster speeds.

Because accompanying the decrease in transistor size which was the fundamental driver what is popularly described as Moore’s law was another, related, effect – “Dennard scaling” (named after Robert Dennard who led the research team from IBM which first described this effect in a 1974 paper). The key effect of Dennard scaling was that as transistors got smaller the power density was constant – so if there was a reduction in a transistor’s linear size by 2, the power it used fell by 4 (with voltage and current both halving).

This is what has broken down – not the ability to etch smaller transistors, but the ability to drop the voltage and the current they need to operate reliably. (In fact the reduction in linear scale of transistors ran slightly ahead of the reduction of other other components at the start of this century, leading to 3+ Ghz chips a bit faster than was expected – though that is all we got to.)

What has happened is that static power losses have increased every more rapidly as a proportion of overall power supplied as voltages have dropped. And static power losses heat the chip, further increasing static power loss and threatening thermal runaway – and complete breakdown.

Reasons for increased static power loss include complex quantum effects as component size falls and the chemical composition of chips is changed to handle the smaller sizes. And there seems to be no way out. “Moore’s law”, as popularly understood in the sense of ever faster chips, is dead and even if we don’t get the science, we understand all the same – as the recent catastrophic fall in PC sales has demonstrated: nobody is interested in buying a new computer for their desktop when the one they already have is not obsolete.

Instead we are buying smaller devices (as Moore’s law makes that possible too) and researchers (me included) are looking at the alternatives. My research is not in electronics, but software for multicore devices, but the electronics researchers have not completely given up hope they can use other methods to build faster chips, but there is nothing to show for it yet (outside the lab – see the links below for some recent developments).


Update: I have edited slightly for sense and to reflect the fact that when Gordon Moore first spoke about what was later termed his “law” there were no integrated circuits available and that recent developments by IBM point to at least the potential of solving the problem by developing new materials that might eliminate or minimise static power loss (see the links below).