# The long tail of a Zipf distribution

Back in the days of the first dot.com bubble the talk was of the “long tail” – how web retailers could make a lot of money by selling small amounts of a large number of different things.

True enough, one of the great survivors of those days – Amazon – does make money from the “long tail” and no amounts of protesting on behalf of small local bookshops makes up for the fact that you can, more or less, buy any book in print (and many which aren’t) off Amazon and have it delivered to your door.

One way of describing these long tails is the “Zipf distribution” which, in its purest sense, states that the frequency of an item (as originally formulated a word in a language or corpus) is inversely proportional to its rank in the list. In other words:

$f \propto \frac{1}{R}$

So the frequency of the second most frequently occurring word or thing would be half the most frequently occurring and the third, one third and so on.

We can generalise this into:

$f = \frac{k}{R^n}$ where $k$ and $n$ are some constants.

For instance, it is found that, for cities in many countries the population, $p$ varies as $R^{-1.07}$.

The important thing about this distribution, if you are an internet sales director, is that the space under the graph can be huge – so that while enormous numbers of sales can be found at the top end  – think of the phenomenon of Fifty Shades of Greyearlier in 2012 – there can be plenty of money made selling The Annotated Turingtoo.

As a little thought experiment – assuming $n=1.07$ for Amazon/book sales, this means that Fifty Shades of Grey, now 34th in Amazon’s best seller list is probably selling something like 11,500 copies a week (compared to 500,000 at its peak), while the Annotated Turing, ranked at 35,681 is selling maybe…6 or 7. (I am guessing, though, that $n$ is probably greater than 1.07 for books.)

(I was in Istanbul airport recently and if the bookshops there are any guide, reports of that country’s soft Islamisation are over-cooked: Fifty Shades was piled in every corner.)