Representing one terabit of information


I picked up my copy of Roger Penrose‘s Cycles of Time: An Extraordinary New View of the Universeand reread his opening chapter on the nature of entropy (if you struggle with this concept as a student then I recommend the book for this part alone – his exposition is brilliant).

My next thought was to think if I could model his comments about a layer of red paint sitting on top of blue paint – namely could I just write a program that showed how random molecular movements would eventually turn the whole thing purple. Seemed like a nice little programming project to while a way a few hours with – and the colours would be lovely too.

Following Penrose’s outline we would look at a ‘box’ (as modelled below) containing 1000 molecules. The box would only be purple if 499, 500 or 501 molecules were red (or blue), otherwise it would be shaded blue or red.

three dimensional grid

And we would have 1000 of these boxes along each axis – in other words 10^9 boxes and 10^{12} molecules – with 5 \times 10^8 boxes starting as blue (or red).

Then on each iteration we could move, say 10^6 molecules and see what happens.

But while the maths of this is simple, the storage problem is not – even if we just had a bit per molecule it is one terabit of data to store and page in and out on every iteration.

I cannot see how compression would help – initially the representation would be highly compressible as all data would be a long series of 1s followed by a long series of 0s. But as we drove through the iterations and the entropy increased that would break down – that is the whole point, after all.

I could go for a much smaller simulation but that misses demonstrating Penrose’s point – that even with the highly constrained categorisation of what constitutes purple, the mixture turns purple and stays that way.

So, algorithm gurus – tell me how to solve this one?

Update: Redrew the box to reflect the rules of geometry!

9 thoughts on “Representing one terabit of information

  1. I don’t actually understand your model. Is a “box” two dimensional (a rectangle) or three dimensional (a box)? Why would you array a box along each axis?

    I would naively assume that a rectangle would need to be arrayed in one dimension to fill some 3-space and a box would already fill some 3-space.

    Also, the way you describe it sounds like you are only filling along an axis. I bet you want the result to be a box, not a 3-D cross.

    On my screen, your diagram isn’t coloured red and blue. Should it be?

    • Hi Hugh, sorry – the graphic was just me mucking about with a bit of Metapost. But yes, it’s a 2-D representation of a three dimensional box – I may twiddle the Metapost script to make it more obvious. Each rectangle would be 10 x 10 x 10 – ie 1000 molecules. At the start half the boxes would be all blue and half would be all red. Then we’d move 1000000 molecules by one place on each iteration. The box is just black and white above.

  2. Rather than storing the state of every molecule, why not store a simple count of the red molecules in each box? I assume that the “moving” operation will be somehow random, so you will choose some number of molecules from a given box and move them elsewhere. If that box is 80% red before the move, assume that 80% of the selected molecules which move are red, with some care given to rounding.

    But 1000 molecules per box may not be enough to demonstrate Penrose’s point: “when n gets very large, we find that the vast majority of the ball arrangements do satisfy this condition” — within a tenth of a percent of a perfect balance of red and blue.

    If we choose 1000 molecules from an unlimited supply of half red, half blue molecules, the likelihood is not very good that the outcome will be exactly half red, or within one molecule of a perfect split.

    The probability of choosing 500 red molecules in a row, then 500 blue molecules, is small: 0.5^1000, about 10^-301. The number of ways this set of molecules can be ordered is 1000 choose 500, about 2.7 x 10^299. So the probability of getting an exact 50/50 split of red and blue molecules is the product, about 0.025. The cases for 499 red molecules and 501 red molecules are very little different, all about 2.5%.

    These outcomes are the most likely ones, but together have less than 10% chance of occuring in a given box. So we expect most of the boxes will appear red or blue, and fewer than one in ten will appear (by the strict definition) purple.

    • Thanks.
      I had thought about just storing the count in each of the boxes – but how do you determine whether a molecule has moved out of the box or not? The molecules have 18 degrees of freedom (if I am correct in my scribbled calculations) – I am proposing they move one cell at a time, not from one box to another. We need to know when they have moved from one box to another box.

  3. The box of molecules is an abstraction already, so I would be comfortable with random assignment: simply iterate through all the pairs of cubes which are neighbors at each turn, and assume that some constant percentage of the molecules will be swapped between them. This assumes that the molecules in any given box are uniformly mixed, which is true at the beginning at least. In any case, Penrose is not waiting around for Brownian motion to mix the colors, he’s stirring the pot with a stick.

    If you want to respect the individuality of molecules, it does seem that storage will be tricky. Perhaps you could save space by storing a count of the red molecules that are a certain distance from the edge of a cube, since it’s only when they get to the edge that a change we care about can happen.

    There are 8 molecules which are 4 steps from the nearest edge, 56 which are 3 steps away, 152 which are 2 steps away, 296 which are 1 step away, and 386 on the edge (alternate values of OEIS A005897). Ignoring some untidiness at corners, the algorithm could specify that a molecule has a 1/3 chance of moving outward, 1/3 chance of moving inward, and 1/3 chance of staying in the same layer.

    If integers are stored with 32 bits, five of them take only 160 bits, compared to the bitmap of 1000 bits you need to record the color of each molecule.

    A description of 10^9 boxes can then be stored in about 150GB.

    • Thanks again.

      You are right – and I was thinking this only yesterday – that I could just average over all the boxes. The only issue was the edge effect you discuss above, which I’ll give some thought to.
      You are right too that he does talk about stirring with a stick – but I thought I’d give that a miss. To the extent that this code – if and when I write it (I guess I’ll have to now after this discussion :)) – is of any use it would be as an educational tool to show that Brownian motion will mix different gases or liquids – when I was 11 my science teacher demonstrated this with some bromine, this would be an electronic equivalent.

      • The initial state of the system is symmetric horizontally, right? It’s a layer of red, 500 boxes high, sitting on a layer of blue the same height. So you could really save some work and just simulate the vertical diffusion in a single column of 1000 boxes. This ignores horizontal movement of molecules, but those movements do not contribute to the effect we’re interested in.

Comments are closed.