The long tail of a Zipf distribution

English: Statistical meaning of The Long Tail
English: Statistical meaning of The Long Tail (Photo credit: Wikipedia)

Back in the days of the first bubble the talk was of the “long tail” – how web retailers could make a lot of money by selling small amounts of a large number of different things.

True enough, one of the great survivors of those days – Amazon – does make money from the “long tail” and no amounts of protesting on behalf of small local bookshops makes up for the fact that you can, more or less, buy any book in print (and many which aren’t) off Amazon and have it delivered to your door.

One way of describing these long tails is the “Zipf distribution” which, in its purest sense, states that the frequency of an item (as originally formulated a word in a language or corpus) is inversely proportional to its rank in the list. In other words:

f \propto \frac{1}{R}

So the frequency of the second most frequently occurring word or thing would be half the most frequently occurring and the third, one third and so on.

We can generalise this into:

f = \frac{k}{R^n} where k and n are some constants.

For instance, it is found that, for cities in many countries the population, p varies as R^{-1.07}.

The important thing about this distribution, if you are an internet sales director, is that the space under the graph can be huge – so that while enormous numbers of sales can be found at the top end  – think of the phenomenon of Fifty Shades of Greyearlier in 2012 – there can be plenty of money made selling The Annotated Turingtoo.

As a little thought experiment – assuming n=1.07 for Amazon/book sales, this means that Fifty Shades of Grey, now 34th in Amazon’s best seller list is probably selling something like 11,500 copies a week (compared to 500,000 at its peak), while the Annotated Turing, ranked at 35,681 is selling maybe…6 or 7. (I am guessing, though, that n is probably greater than 1.07 for books.)

(I was in Istanbul airport recently and if the bookshops there are any guide, reports of that country’s soft Islamisation are over-cooked: Fifty Shades was piled in every corner.)

Power law growth is not the only way on the internet

Sometimes it is difficult to understand what to think about the internet as a transformative medium.

We can see the Arab Spring and the way in which networked and social media has broken down the only monopolies of power and information in authoritarian societies (though do not forget the way the Iranian regime used social media to pick off its opponents too) but in Britain I can also see that, despite a lot of hype and hoopla, the “traditional” media – broadcast news and print journalism – still count for more than any and every blog, even though the internet is continually reshaping these outlets too (or in the case of print, slowly strangling it to death).

I think one of the barriers to understanding the real impact of the internet on communications is what seems to be the need of the growing army of social media consultants to deploy the hyperbolic. The famous (and utterly compelling) video shown below is just one example.

But the reality is that one can build a decent presence on the internet without looking for explosive growth, viral spread and power law dynamics. This – the long tail – is typified by the this blog. I do not claim that anything written here is driving the news agenda in Britain, and nor is readership growing exponentially. But it is growing in what appears to be a linear fashion.

I have had a few stories picked up by slashdot and occasionally by one or two other influential tweeters and similar, which cause spikes in readership of particular pages. So I looked at the graph of readers of the home page.

Home page view numbers

There are still spikes for the wash over from the big hits, but much more important for me is the steady growth in the core readership. Already in 2012 the hits on the home page (3721) are comparable to the total for 2011 (4215) despite the two big peaks you can see at the end of February and start of September for last year. Much of that traffic is search engine driven (across the site as a whole Slashdot has been the top referrer – 15900 views – with all search engines managing 9646 referrals, but that is way ahead of Twitter – 1914 – and Facebook – 479).

Of course, it would be great if the blog “went viral” and millions were coming here to read about hex editors and domain-specific languages. But that is never likely, so the steady growth is a healthy sign, I think, that I must be getting something right. It also ought to remind social media boosters that theirs is not the only way.