Representing one terabit of information

I picked up my copy of Roger Penrose‘s Cycles of Time: An Extraordinary New View of the Universeand reread his opening chapter on the nature of entropy (if you struggle with this concept as a student then I recommend the book for this part alone – his exposition is brilliant).

My next thought was to think if I could model his comments about a layer of red paint sitting on top of blue paint – namely could I just write a program that showed how random molecular movements would eventually turn the whole thing purple. Seemed like a nice little programming project to while a way a few hours with – and the colours would be lovely too.

Following Penrose’s outline we would look at a ‘box’ (as modelled below) containing 1000 molecules. The box would only be purple if 499, 500 or 501 molecules were red (or blue), otherwise it would be shaded blue or red.

three dimensional grid

And we would have 1000 of these boxes along each axis – in other words 10^9 boxes and 10^{12} molecules – with 5 \times 10^8 boxes starting as blue (or red).

Then on each iteration we could move, say 10^6 molecules and see what happens.

But while the maths of this is simple, the storage problem is not – even if we just had a bit per molecule it is one terabit of data to store and page in and out on every iteration.

I cannot see how compression would help – initially the representation would be highly compressible as all data would be a long series of 1s followed by a long series of 0s. But as we drove through the iterations and the entropy increased that would break down – that is the whole point, after all.

I could go for a much smaller simulation but that misses demonstrating Penrose’s point – that even with the highly constrained categorisation of what constitutes purple, the mixture turns purple and stays that way.

So, algorithm gurus – tell me how to solve this one?

Update: Redrew the box to reflect the rules of geometry!