## Representing one terabit of information

I picked up my copy of Roger Penrose‘s Cycles of Time: An Extraordinary New View of the Universeand reread his opening chapter on the nature of entropy (if you struggle with this concept as a student then I recommend the book for this part alone – his exposition is brilliant).

My next thought was to think if I could model his comments about a layer of red paint sitting on top of blue paint – namely could I just write a program that showed how random molecular movements would eventually turn the whole thing purple. Seemed like a nice little programming project to while a way a few hours with – and the colours would be lovely too.

Following Penrose’s outline we would look at a ‘box’ (as modelled below) containing 1000 molecules. The box would only be purple if 499, 500 or 501 molecules were red (or blue), otherwise it would be shaded blue or red.

And we would have 1000 of these boxes along each axis – in other words $10^9$ boxes and $10^{12}$ molecules – with $5 \times 10^8$ boxes starting as blue (or red).

Then on each iteration we could move, say $10^6$ molecules and see what happens.

But while the maths of this is simple, the storage problem is not – even if we just had a bit per molecule it is one terabit of data to store and page in and out on every iteration.

I cannot see how compression would help – initially the representation would be highly compressible as all data would be a long series of 1s followed by a long series of 0s. But as we drove through the iterations and the entropy increased that would break down – that is the whole point, after all.

I could go for a much smaller simulation but that misses demonstrating Penrose’s point – that even with the highly constrained categorisation of what constitutes purple, the mixture turns purple and stays that way.

So, algorithm gurus – tell me how to solve this one?

Update: Redrew the box to reflect the rules of geometry!

## Entropy and randomness in communications, a quick introduction

Last year I remarked on how Roger Penrose‘s discussion of entropy in Cycles of Time did more to explain the physical nature of the concept (to me, at least) than many years of formal education.

And now, another book, Information and Coding Theory has given me some real insight into how computers deliver pseudo-randomness and why they look for sources of entropy.

The maths of all this are deceptively simple, but are of fundamental importance in information theory.

First of all, we can think of what sort of information events convey. There is a famous aphorism about the habits of bears, the direction from which the Sun rises and the title of the head of the Catholic Church. No newspaper or TV bulletin would ever bother to tell us, on a daily basis, that the Sun has indeed risen from the East. In effect that event conveys no information because it is completely expected. Toss a coin, though, and we do not know the outcome in advance – the result is of interest because the outcome is not known in advance.

Hence we can posit an information function $I(e)$ which tells us how much information we can derive from event $e$, or, if we think of symbols in a data stream we can call this $I(s_i)$ – the amount of information we can derive if the next symbol we see is $s_i$.

From our earlier discussion we can see that the result of this function is dependent on the probability of the event occurring or the symbol being received – i.e. if we knew the Sun was always going to rise in the East we can derive no information from that very thing happening. Hence:

$I(s_i)$ is a function of  $p_i$ where $p_i$ is the probability that the next symbol to be seen is $i$. As we will only consider ‘memoryless’ events we will also insist that $p_i$ is independent of time or any previously seen symbol. Hence:

$I(s_is_j)$ (i.e. the information we can gain from seeing the symbol $s_i$ followed by the symbol $s_j$ is $I(s_i) + I(s_j)$

Taking these points together we can define $I(s_i) = log\frac{1}{p_i} = -log(p_i)$

We can see that $I$ tends to infinity if the probability of an event tends to zero (think of the Sun rising in the West!) and tends to 0 if nothing unexpected happens.

The entropy of a system (communication) $H(S)$ then becomes:

$H(S) = \sum\limits_i^q p_iI(s_i) = \sum\limits_i^q p_ilog\frac{1}{p_i} = -\sum\limits_i^q p_i log(p_i)$