Coin tossing conundrum


This is from The Mathematics of Various Entertaining Subjects: Research in Recreational Math.

It left me so puzzled it took me a while to get my head around it.

It’s game – Flipping Fun – and the idea is that participants pick a set of three coin tosses (eg THH, HHH, THT and so on) and the winner is the person who has picked the first sequence that comes up.

The mind bending part of this is:

  • The coin is fair so the odds of it turning up as heads or tails on any toss are the same (1/2).
  • Thus any sequence of a given length is equally likely – i.e. if the coin is tossed ten times then HHHHHHHHHH is as likely as HTHTHTHTHT or any other sequence.
  • Despite both of the above facts some sequences of three tosses are much more likely to win than others.

To illustrate this point, think of HHT against TTT. Here HHT is much more likely to win – because once two heads have been tossed (odds 1/4) then it is only a question of waiting for a tail to come up, as tossing a further head keeps the HH sequence alive.

So, for instance, in five tosses the odds of HHT winning are 1/4 (8/32), but the odds of TTT winning inside five tosses are 5/32

What’s the evidence your numbers are faked?


Currently reading: Ten Great Ideas About Chance.

It can be a bit of a frustrating read – a lot of the chapters about gambling and judgement appear to be poorly explained to me – especially if you’ve ever actually placed a bet on anything (I am far from a regular gambler – I’m not sure I’ve placed a bet on anything in the last decade but I still know how it works).

But it also has lots of exceptionally interesting stuff: not least the revelation that naturally occurring number sets tend to begin with 1 with a proportion of 0.3 whilst fraudsters (including election fixers and presumably scientific experiment fixers) tend to use such numbers 0.111… (1/9) of the time.

So, taking my life in my hands – what about results in my PhD – (read thesis here). I am actually calculating this as I go along – so I really do hope it doesn’t suggest fraud!

Anyway – first table, which has a calculated SPECmark figure based on other researchers’ work – so not really my findings, though the calculations are mine.

Numbers are: 35, 22.76, 13.41, 7.26, 3.77, 1.94, 1.01, 0.55, 0.31, 0.20 and 0.14

And (phew) that means 4 out of 11 (0.364) begin with 1 and so looks fraud free on this test.

But as I say, those are my calculations, but they are not my findings at root.

So moving on to some real findings.

First, the maximum total length of time it takes for a simulation to complete a set of benchmarks in simulated cycles: 43361267, 1428821, 1400909, 5760495, 11037274, 2072418, 145291356, 5012280.

Here a whole half are numbers that begin with 1. Hmmmm.

The mean time for the same: 42368816, 1416176, 1388897, 5646235, 10964717, 2026200, 143276995, 4733750.

Again, half begin with one.

What about the numbers within the results – in this case the amount of cycles lost due to waiting on other requests?

The numbers are: 35921194, 870627, 844281, 4364623, 1088954, 1446305, 110996151, 3420685

Now the proportion has fallen to 3/8 (0.375) and I can feel a bit more comfortable (not that the other results suggested I was following the fraudsters’ one-ninth pattern in any case.)

Later I produce numbers for an optimised system. How do they perform?

The maxima now become: 2514678, 1357224, 1316858, 3840749, 10929818, 1528350, 102077157, 3202193.

So the proportion beginning with 1 has actually risen to 5/8.

And the means show a similar pattern. Worse news with the blocks though. They now become: 730514, 775433, 735726, 2524815, 806768, 952774, 64982307, 1775537. So I am left with 1/8 (0.125) starting with 1 – something dangerously close to 1/9.

Can I save myself? I hope so … the figures above are for the first iteration of the system, but when we look at subsequent iterations a different pattern (for the blocks) emerges: 130736, 0, 0, 1612131, 97209, 232131, 64450433, 1117599.

This is now back to 3/8.

Of course, I didn’t cheat and I also suspect (I would, wouldn’t I?) that the block count is a better measure of whether I had or not – because the overall execution time of the benchmarks is, in some sense, a function of how long their execution path is and effectively that is determined – the blocking in the system is an emergent property and if I faked the whole thing I’d have less to go on and be much more likely to make it up.

Well, that’s my story and I’m sticking to it.



The Sleeping Beauty Controversy


I am reading “The Best Writing on Mathematics 2018” and amongst the various fascinating articles is one on “The Sleeping Beauty Controversy” by Peter Winkler.

the problem

(Here’s a video link – which examines the problem but isn’t related directly to Peter Winkler’s piece)

To steal Professor Winkler’s own description from the article:

Sleeping Beauty agrees to the following experiment. On Sunday she is put to sleep, and a fair coin is flipped. If it comes up heads, she is awakened on Monday morning; if tails, she is awakened on Monday morning and again on Tuesday morning. In all cases, she is not told the day of the week, is put back to sleep shortly after, and will have no memory of any Monday or Tuesday awakenings.

When Sleeping Beauty is awakened on Monday or Tuesday, what – to her – is the probability that the coin came up heads?

The bulk of mathematicians are “thirders” – they think the correct answer is 1/3. But a sizable minority are “halfers” – believing the correct answer is 1/2. Still others think the question is undecidable.

For what it’s worth I am with the thirders – Sleeping Beauty would understand that if, for instance, the coin was tossed 100 times she’d be woken around 150 times and so the chances she’s being woken as a result of a head are one-in-three.

The problem restated?

But what really struck me in the article was what Professor Winkler says is a rephrasing (though admits some might disagree) of Sleeping Beauty – because this leads me to a “halfer” position:

Alice, Bob and Charlie have each taken a new sleeping pill. In hospital experiments half the subjects slept through the night… but the other half woke up once in the middle of the night then returned to sleep and woke up in the morning with no memory of the night awakening.

Alice wakes up in the middle of the night… her credence that the pill has worked drops to zero… Bob wakes up in the morning, his credence that the pill worked remains at 1/2…

Charlie, who has blackout shades in his bedroom, wakes up not knowing whether it is morning. According to the thirders, his credence in the efficacy of the pill is 1/3 until he raises the shades…

But how can this be? Charlie has no additional information beyond that he has woken up (as he would do in any case) and so surely his prior information – that the pill is 1/2 effective remains unchanged?

(Here’s another link to a different article on this question.)

“Waiting for her Phillip”by Omnimoving is licensed under CC BY-NC-SA 2.0

Mathematicians: please help!


I am still stuck with a problem with the M/G/1 queue: not quite the same as my original problem (discussed here) as I understand that now – but the next stage really – involving some manipulation of Laplace transforms.

I won’t post all the details here, because you can read them here instead and, if this sort of thing matters to you (and why wouldn’t it?) pick up the bounty I am offering on the maths Stack Exchange.

Puzzle about an M/G/1 queue


I am deeply puzzled by a question about the behaviour of an M/G/1 queue – i.e., a queue with a Markovian distribution of arrival times, a General distribution of service times and 1 server. I have asked about this on the Math Stackexchange (and there’s now a bounty on the question if you’d like to answer it there – but as I am getting nowhere with it, I thought I’d ask it here too.

(This is related to getting a more rigorous presentation on thrashing into my PhD thesis.)

Considering an M/G/1 queue with Poisson arrivals of rate λ – this comes from Cox and Miller’s (1965) “The Theory of Stochastic Processes” (pp 240 – 241) and also Cox and Isham’s 1986 paper “The Virtual Waiting-Time and Related Processes“.

My question is what is the difference between (using the authors’ notation) p_0 and p(0,t)? The context is explained below…

In the 1965 book (the 1986 paper presents the differentials of the same equations), X(t) is the “virtual waiting time” of a process and the book writes of “a discrete probability p_0(t) that X(t)=0, i.e., that the system is empty, and a density p(x,t) for X(t)>0“.

The system consumes virtual waiting time in unit time, i.e., if X(t)\leq\Delta t and there are no arrivals in time \Delta t then X(t + \Delta t) = 0.

The distribution function of X(t) is then given by:
F(x,t)=p_0(t)+\int_{0}^{\infty}p(z,t)dz

They then state:
p(x, t+ \Delta t) = p(x + \Delta t, t)(1 - \lambda \Delta t) +p_0(t)b(x)\lambda\Delta t + \int_{0}^{x}p(x - y, t)b(y)dy\lambda\Delta t + o(\Delta t)

I get all this – the first term on the RHS is a run-down of X(t)>0 with no arrivals, the second is adding b(x) of service time when the system is empty at t and the third, convolution-like, term is adding b(y) of service time from an arrival when it’s not empty at t. (The fourth accounts for their being more than one arrival in \Delta t but it tends to zero much faster than \Delta t so drops out as \Delta t approaches the limit.)

And … and this is where I have the problems …

p_0(t+\Delta t)=p_0(t)(1-\lambda\Delta t) +p(0,t)\Delta t(1 - \lambda\Delta t) + o(\Delta t)

The first term of the RHS seems clear – the probability that the system is empty at t multiplied by the probability there will be no arrivals in \Delta t, but the second is not clear to me at all.

I assume this term accounts for the probability of the system “emptying” during \Delta t but I don’t see how that works, is anyone able to explain?

In other words, how does p(0,t)\Delta t(1 - \lambda\Delta t) represent this draining? Presumably (1 - \lambda\Delta t) again represents the possibility of zero arrivals in \Delta t, so how does p(0, t)\Delta t represent the X(t) \leq \Delta t situation?

If we take the equilibrium situation where p_0(t) = p_0 and p(x, t) = p(x) then, if we differentiate and as p^{\prime}_0 = 0, we get p_0 = \lambda p(0) – so, again, what does p(0) represent?

Reviving this blog … with a question about binary trees


Work changes and a determination to actually finish my PhD mean I really should make a bit of an effort here and so I will.

Here is a puzzle that has been bothering me about binary trees which has come from my PhD research…

In that research I am investigating how to implement virtual memory on a many-core Network-on-Chip (NoC) system. Essentially I have been building and running emulators.

The core of the emulation is the “memory tree” – the connection between the cores of the NoC and the global memory controller. The “Bluetree” memory tree is a binary tree arrangement, with the root node (level 0) connecting to the memory controller and the leaves at the NoC tiles (with each tile having a single processor).

At every non-leaf level of the tree there are multiplexors (mux) with a buffer that can hold a single memory packet – so if two requests for global memory arrive at once and the buffer is free there needs to be an arbitration process to decide which packet gets to occupy the buffer and which has to wait.

We have 128 leaf nodes – as in this tree…

Binary tree with 128 leaf nodes

With this many leaf nodes, the question of the arbitration policy of the muxes is important. A naïve approach would be to say, if two packets contest for the buffer, let the packet on the left win the first time and then pass the packet on the right: after all this guarantees no starvation.

But it is actually a hopeless policy … the leftmost leaf node (shall number from zero) faces no opposition, the rightmost (number 127) loses out to every other node.

But number of leaf node alone is not a sufficient way of describing the relative number of blocks – for instance leaf node 16 appears to be in a better place than leaf node 15 – as going up the tree leaf node 15 can be blocked at level 6 (the root node is at level 0), at level 5, level 4, level 3 and level 2: while leaf node 16 is only blocked at level 1.

Practical testing of this naïvely fair approach in the emulator gives results like the following… as can be seen the right hand part of the tree performs appallingly but even there lots of variation can be seen:

some test results

My question is this: what is the correct (i.e., correct mathematically) way of describing the order of the leaf nodes?

From “The Foundations of Mathematics”


This is a fragment of a question from the first chapter of “The Foundations of Mathematics” (Amazon link):

Record whether you think the following statements are true or false:

(a) All of the numbers 2, 5, 17, 53, 97 are prime.

(b) Each of the numbers 2, 5, 17, 53, 97 is prime.

(c) Some of the numbers 2, 5, 17, 53, 97 are prime.

(d) Some of the numbers 2, 5, 17, 53, 97 are even.

 

For what-it’s-worth I thought all of these statements are true.

Could someone explain this contradiction to me?


Reading on with Julian Havil’s Gamma: Exploring Euler’s Constant and inspired by his discussion of the harmonic series, I come across this:

\frac{1}{1-e^x} = 1 + e^x + e^{2x} + e^{3x} + ...

Havil calls this a “non-legitimate binomial expansion” and it seems to me it can be generalised:

(1 - r^x)^{-1}= 1 + r^x + r^{2x} + r^{3x} + ...

as 1 = (1 - r^x)(1 + r^x +r^{2x}+r^{3x}+... )= 1 + r^x +r^{2x}+r^{3x}+...-r^x-r^{2x}-r^{3x}-...

And, indeed if we take x=-1, r=2 we get:

\frac{1}{1-2^{-1}} = 2 = 1 + \frac{1}{2} + \frac{1}{4} + \frac{1}{8} +... at the limit.

But if we have x \geq 0 it is divergent and the identity, which seems algebraically sound to me, breaks down. E.g., r=2, x=2:

\frac{1}{1-4} = -\frac{1}{3} = 1 + 4 + 8 + 16 + ...

So what is the flaw in my logic?

In Pursuit of the Traveling Salesman


In Pursuit of the Traveling Salesman: Mathematics at the Limits of Computation

English: A representation of the relation amon...
English: A representation of the relation among complexity classes, which are subsets of each other. (Photo credit: Wikipedia)

This is a strange little book – and while I am happy to recommend it, I am not really clear in my own mind what sort of a book it is meant to be.

A popular description of a famous problem in the “NP” space and a gentle introduction to the whole issue of computational complexity and complexity classes? Not really, it assumes a bit too much knowledge for that.

So, a canter round the basic maths of the TSP and complexity? Not really that either, as it is light on mathematical specifics.

Still, it’s a decent book.

Was new maths really such a disaster?


English: Freeman Dyson
English: Freeman Dyson (Photo credit: Wikipedia)

On holiday now, so I fill my time – as you do – by reading books on maths.

One of these is Julian Havil’s Gamma: Exploring Euler’s Constant.

The book begins with a foreword by Freeman Dyson, in which he discusses the general failure, as he sees it (and I’d be inclined to agree), of mathematical education. He first dismisses learning by rote and then condemns “New Mathematics” as – despite its efforts to break free of the failures of learning by rote – an even greater disaster.

Now, I was taught a “new maths” curriculum up to the age of 16 and I wonder if it really was such a disaster. The one thing I can say is that it didn’t capture the beauty of maths in the way that the ‘A’ level (16 – 18) curriculum came close to doing. At times I really wondered what was it all about – at the age of 15 matrix maths seems excessively abstract.

But many years later I can see what all that was about and do think that my new maths education has given me quite a strong grounding in a lot of the fundamentals of computing and the applied maths that involves.

To the extent that this blog does have an audience I know that it is read by those with an interest in maths and computing and I would really welcome views on the strengths and weaknesses of the new maths approach.