The long tail of a Zipf distribution

English: Statistical meaning of The Long Tail
English: Statistical meaning of The Long Tail (Photo credit: Wikipedia)

Back in the days of the first bubble the talk was of the “long tail” – how web retailers could make a lot of money by selling small amounts of a large number of different things.

True enough, one of the great survivors of those days – Amazon – does make money from the “long tail” and no amounts of protesting on behalf of small local bookshops makes up for the fact that you can, more or less, buy any book in print (and many which aren’t) off Amazon and have it delivered to your door.

One way of describing these long tails is the “Zipf distribution” which, in its purest sense, states that the frequency of an item (as originally formulated a word in a language or corpus) is inversely proportional to its rank in the list. In other words:

f \propto \frac{1}{R}

So the frequency of the second most frequently occurring word or thing would be half the most frequently occurring and the third, one third and so on.

We can generalise this into:

f = \frac{k}{R^n} where k and n are some constants.

For instance, it is found that, for cities in many countries the population, p varies as R^{-1.07}.

The important thing about this distribution, if you are an internet sales director, is that the space under the graph can be huge – so that while enormous numbers of sales can be found at the top end  – think of the phenomenon of Fifty Shades of Greyearlier in 2012 – there can be plenty of money made selling The Annotated Turingtoo.

As a little thought experiment – assuming n=1.07 for Amazon/book sales, this means that Fifty Shades of Grey, now 34th in Amazon’s best seller list is probably selling something like 11,500 copies a week (compared to 500,000 at its peak), while the Annotated Turing, ranked at 35,681 is selling maybe…6 or 7. (I am guessing, though, that n is probably greater than 1.07 for books.)

(I was in Istanbul airport recently and if the bookshops there are any guide, reports of that country’s soft Islamisation are over-cooked: Fifty Shades was piled in every corner.)

How to win at “Deal or No Deal”

English: Publicity photo from the television s...
English: Publicity photo from the television show Let’s Make a Deal. Pictured are host Monty Hall and announcer Jay Stewart with contestants. (Photo credit: Wikipedia)

I have never actually watched “Deal or No Deal“, but I am assuming it is the same as the “Let’s Make A Deal” TV gameshow described in Ian McEwan‘s spy romp Sweet Tooth– which I have just read in the last 18 hours of plane flights, mid-air turn arounds (bad weather at the destination rather than anything more serious) and airport lounges.

The novel is voiced by a Cambridge maths graduate (and low grade MI5 spook) who at one point explains to her lover the “paradox of choice” based on “Let’s Make A Deal”:

You have three boxes, one contains a prize, the two others are empty.

You pick one and then the dealer reveals one other to be empty. Do you stick with your original choice or pick the other box?

The naive answer is to stick with your original choice. After all the odds of the prize being in the box you first picked, or any other box –  are one in three – now increased to one in two.

But that is wrong.

In fact you double your chances of winning by picking the other box.

Think of it this way. If you picked an empty box as your first choice – and the odds are that you did as there are two of them and only one with a prize – then the dealer has no choice but to reveal the other empty box and so the third box must be the one with the prize.

So the odds of you picking the prize in this way are \frac{2}{3} \times 1 = \frac{2}{3}. The odds that you picked the prize the first time round are just \frac{1}{3} .

Of course, this is not a guaranteed way to win. But you would win two times in every three games if you followed this strategy.

Update: Having now read the Wikipedia entry on “Deal or No Deal” I can see it is similar to “Let’s Make a Deal”, but not the same and, in fact, rather more complex. I am not sure the above analysis of the so-called “Monty Hall Problem” (named after the presenter of “Let’s Make A Deal”) would be much of a guide to a contestant!

Plan9 on the Raspberry Pi

Glenda, the Plan 9 Bunny
Glenda, the Plan 9 Bunny (Photo credit: Wikipedia)

Plan 9 from Bell Labs” was meant to be the successor system to Unix and like the original was designed and built by AT&Ts engineers at Bell Labs(the title is, of course, a skit on what is supposedly the best worst-ever film – “Plan 9 from Outer Space”).

Plan 9 never really made it. Linux came along and gave us Unix for the masses on cheap hardware for free and the world moved on. (Though some of the ideas in Plan 9 were retro-fitted into Linux and other Unix-like systems.)

The increased speed of commodity computers – latterly sustained via SMP – meant that computing power that once seemed only available to the elite could be found on the High Street and easy to use and install clustering software meant scientists and others could build super-computers using cheap hardware and free software. The multi-computer idea at the heart of Plan 9 seemed to have been passed-by as we screamed along the Moore’s Law Highway.

But now Moore’s Law is breaking down – or rather we are discovering that while the Law continues to apply – in other words we can still double the number of transistors on silicon every 18 – 24 months – other factors (heat dissipation essentially) mean we cannot translate a doubling of transistors into a computer that runs twice as fast. And so the multi-computer idea is of interest once more.

Plan 9 is not likely to be the operating system of the future. But as an actually existing multi-computer operating system it could still have a lot to teach us.

Now it has been ported to run on the Raspberry Pi single board computer I have decided to order another three of these things (I already have one running as a proxy server) and use them as Plan 9 nodes. The boards should be here in about three weeks (I hope), meaning I will have them as a Christmas present to myself.

Code Club first session

Code Club logoAt last managed to lead my first “Code Club” session – it had a slightly chaotic start as none of the computers we were using had Scratch installed and nor did we have access to a login that allowed us to install Scratch in the Windows “Programs” directory – but once we worked around that we all had great fun.

From the start it was obvious that Scratch made sense to the kids – they immediately grasped that the endless loop control would set the actions it enclosed to run endlessly. Of course nobody (apart from Visual Basic users?) works with similar simple graphics tools when writing an industrial strength program, but that was not the point: this is about teaching loops, conditionals and branches and so on.

The lost time at the start meant it was all a bit hurried so I do not know how much of the programming the children took in – as opposed to just ensuring that their Scratch scripts matched those in the worksheet. But on the first time out – none of the children had used Scratch before – simply being able to manipulate the programming elements was probably more than enough.

In any case, all of them were hugely enthusiastic when I told them they could install Scratch on any computer they had at home and practise on it there.

Code Club feels like a huge success to me already.

A small light gets snubbed out

copyright (Photo credit: A. Diez Herrero)

Yesterday there was a brief flicker of hope that the United States might reform its ludicrous copyright and patent regimes – a hope given birth by a study paper published by the Republican Party, of all people.

Inter alia it made the perfectly correct and completely accurate observation that:

Copyright violates nearly every tenet of laissez faire capitalism. Under the current system of copyright, producers of content are entitled to a guaranteed, government instituted, government subsidized content-monopoly.


Now, though, the hope has been snubbed out. The paper has been withdrawn.

In Europe we do not suffer from some of the excesses of the US’s “intellectual property” regime. But we feel the chill wind of every regressive shift there and our politicians – of all parties – are just as susceptible to the arguments used by vested interests to extend copyright – recently the EU simply stole goods off the public by retrospectively lengthening copyright terms in response to big money lobbying from “rights holders” (fronted up by Cliff Richard). There was no suggestion that any of the UK’s three major parties were in any way opposed to this licensed thievery.


What Rowland Hill really meant

English: Photograph and signature of
English: Photograph and signature of (Photo credit: Wikipedia)

A new letter from Rowland Hill, the Victorian postal reformer has been discovered:

My proposal for a universal penny post was adopted, but I greatly regret it.

It is true that my proposal, on adoption, led to vast increase in mails, and contributed to the greater wealth of the kingdom. It eliminated, at a stroke, the many complex and expensive procedures that previously delayed mails and increased costs. Through simplification it allowed commerce to plan with certainty and by increasing volumes it allowed the Royal Mail to cover the whole kingdom with the low cost, high volume trade ensuring that higher cost destinations could be reached and that certainty itself of course added to the volumes of mail as all could now rely on the service to deliver to all parts.

Yet having seen the success of my proposal I now realise that a huge opportunity to raise revenue was lost. The Royal Mail treated domestic and commercial mails in the same manner – surely they should have been entitled to a share of the profits of the commercial ventures enabled by the universal service. Further it is plain that some used the mail to send postcards while others put many sheaves inside envelopes. In retrospect the simplicity of the penny post enabled this and it would have been better to have insisted that pages inside envelopes were counted at the post office counter and letters were charged accordingly. 

Of course he did not write such nonsense. But this, in essence, is the argument being proposed by Hyosil Kim of KT (the former Korea Telecom – read their wikipedia entry and tell me in all seriousness you don’t think it was written by someone in their pay) and hosted on the International Telecommunications Union’s blog. What a load of rubbish.

Proposals for a new English ICT curriculum

This morning’s Times carries an full page report – on page 3 no less (subscription required) – of the British Computer Society’s (BCS) proposals, on behalf of the Education Department, for a new ICT curriculum.

In fact the newspaper report seems have been injected with more than a little bit of spin – The Times says that pupils should, by the age of 11 (ie Key Stage 2), be able to build a mobile phone app – but the draft programme for the curriculum (thankfully) says no such thing. It states pupils should be able to:

Write programs to accomplish given goals; solve problems by decomposing them into smaller parts; recognize that there may be more than one algorithm to solve a single problem; detect and fix errors in algorithms and programs.

Which is much more sensible.

(The Times also states that KS4 – A level – pupils should be able to build their own languages, presumably meaning some teaching of compilers and related CS concepts such as automata, but again I can see no reference to that.)

Today’s, discredited, ICT curriculum concentrates on what the BCS calls “digital literacy” – basic skills at manipulating “office” products. It has cemented Microsoft’s monopoly position, stripped the UK of its historical lead in teaching kids programming skills and stifled innovation and, frankly, seen schools waste money.

The new programme appears much better but given the tendency of existing software and hardware providers to demand their products and paradigms are included in any curriculum then ideas that kids should be taught to build mobile phone apps or anything similar should be resisted – do we really think that today’s shared memory, lock-controlled, programming model is going to be that relevant in a decade’s time? I do not but can see why many companies with billions invested in existing technologies and models would want to resist the disruption that many-core technologies will bring.

Computing stands on the edge of another revolution:

As multicore chips scale to larger numbers of cores, systems are becoming increasingly complex and difficult to program. Parallel architectures expose more of the system resources to the software and ask programmers to manage them. In addition … programmers are forced to optimize for both performance and energy; a task that’s nearly impossible without knowing the exact hardware and environment in which an application will run. As machines scale to hundreds of cores, it will no longer be possible for the average programmer to understand and manage all of the constraints placed upon them.

(From Eric Lau et al’s paper “Multicore performance optimization using partner cores”, in Proceedings of the 3rd USENIX conference on Hot Topics in Parallelism, USENIX, May 2011)

Even we do not agree with every idea expressed in the above comment, the basic argument is sound – all your programming are belong to us. A new ICT curriculum must be flexible enough to respond to the huge changes that are coming and resist any attempts at technology lock-in. Previous stories about the ICT rethink are littered with corporate name dropping, and the government (any government, frankly) are always too keen for corporate endorsement. So we need to beware.

The BCS programme looks like a promising start, if it can manage to avoid falling into populist traps like the one it seems to have set itself in the Times this morning.

Malware and spies

A few years ago I worked as an adviser to Georgian opposition politicians.

English: Irakli Alasania, Permanent Representa...
English: Irakli Alasania, Permanent Representative of Georgia in the United Nations. “On Wednesday, March 14, 2007 a special reception was held at CEC ArtsLink, (New York, NY) in honor of Ambassador Irakli Alasania, Permanent Representative of Georgia in the United Nations.” (Photo credit: Wikipedia)

Some were good people – I am particularly pleased to see Irakli Alasania take a place in the new Georgian government and hope his career goes further. Others were not so good.

But they had several things in common – principally that they were desperately short of money: the government of the time made sure any business or other figure who funded them was subject to economic (or even physical) attack, and secondly they knew that the government were more than willing to use the standard spy techniques of bugging and paid agents in an attempt to discredit them (one even showed me the hole in her wall where the bug was planted – see about 6’45” in the video)

One thing that they didn’t fear, though, was that their electronic communications were being tapped – the general view being that the state security services were simply not sophisticated enough to organise that.

Well, it seems that they may have been wrong: following a typically dirty and vicious election campaign (which the previous government lost) several senior figures have now been charged with using malware to take control of opposition politicians’ computers as a way of building up evidence against them.

(The election campaign came to life when the opposition released video of prisoners being violently and horrifically sexually abused – in the days that followed the government hurriedly released some counter videos of its own which purported to show opposition politicians contemplating deals with “criminals-in-law” (ie., mafia-type figures) – several of which were fairly crude and obvious disjointed edits.)



Fog of confusion still surrounds “biggest agile project ever”

The DWP’s Universal Credit scheme is widely acknowledged to be the biggest and most ambitious “agile” software development, ever.

English: Iain Duncan Smith-London March 2010
English: Iain Duncan Smith-London March 2010 (Photo credit: Wikipedia)

It is meant to go live in October 2013 yet reports differ as to whether it is all on track on the verge of total collapse.


Of course, agile projects are meant to be ones that deliver product for testing all the time, but nobody seems to have seen any Universal Credit code operating in the wild – nor are there any visible signs that any “stakeholders” (sorry, but that word does as well as any other) outside the DWP have been involved in testing the code and giving feedback to the development team.


In the last week two leading figures in the project were shifted off the team – but one of them, the DWP’s corporate director of major IT projects, Steve Dover, has insisted all is going to plan.


As the project gets closer to October 2013 the political stakes get ever higher – the Cabinet minister in charge, Iain Duncan-Smith (pictured), a former Conservative Party leader who still has much support on the ever more confident right of that party, remains fully committed despite persistent reports that the Treasury are worried that the whole thing is tottering on the brink of collapse.