Coin tossing conundrum


This is from The Mathematics of Various Entertaining Subjects: Research in Recreational Math.

It left me so puzzled it took me a while to get my head around it.

It’s game – Flipping Fun – and the idea is that participants pick a set of three coin tosses (eg THH, HHH, THT and so on) and the winner is the person who has picked the first sequence that comes up.

The mind bending part of this is:

  • The coin is fair so the odds of it turning up as heads or tails on any toss are the same (1/2).
  • Thus any sequence of a given length is equally likely – i.e. if the coin is tossed ten times then HHHHHHHHHH is as likely as HTHTHTHTHT or any other sequence.
  • Despite both of the above facts some sequences of three tosses are much more likely to win than others.

To illustrate this point, think of HHT against TTT. Here HHT is much more likely to win – because once two heads have been tossed (odds 1/4) then it is only a question of waiting for a tail to come up, as tossing a further head keeps the HH sequence alive.

So, for instance, in five tosses the odds of HHT winning are 1/4 (8/32), but the odds of TTT winning inside five tosses are 5/32

Ten Great Ideas About Chance: A Review


Unfortunately Ten Great Ideas About Chance is a disappointment.

The central idea of the book is to look at ten key mathematical-philosophical ideas in probability and, using the history of the idea, explain what they are about and why they matter.

It’s not that the book doesn’t have some very interesting material, but it fails to hit its target over and over again and, unfortunately, even contains some obvious and – presumably – not so obvious errors.

This review states it so much better than I can, so here is an extract:

The chapters are invariably a mix of
1. a trivial example that does not penetrate enough the intended topic because it contains too much of the familiar and too little of the topic that’s being introduced
2. references to original texts that are impenetrable nineteenth century translations into English from eighteenth century originals written in French or German or Latin
3. statements of complex results that would take fifty pages to arrive at if the proofs were shown
4. cheerleading

So what I re-lived by reading this book is my Freshman Year nightmare math class where three times a week I’d follow the first five minutes of the lecture only to subsequently find myself furiously copying from the board so I can read my lecture notes later at home and try to make sense of them.

What’s the evidence your numbers are faked?


Currently reading: Ten Great Ideas About Chance.

It can be a bit of a frustrating read – a lot of the chapters about gambling and judgement appear to be poorly explained to me – especially if you’ve ever actually placed a bet on anything (I am far from a regular gambler – I’m not sure I’ve placed a bet on anything in the last decade but I still know how it works).

But it also has lots of exceptionally interesting stuff: not least the revelation that naturally occurring number sets tend to begin with 1 with a proportion of 0.3 whilst fraudsters (including election fixers and presumably scientific experiment fixers) tend to use such numbers 0.111… (1/9) of the time.

So, taking my life in my hands – what about results in my PhD – (read thesis here). I am actually calculating this as I go along – so I really do hope it doesn’t suggest fraud!

Anyway – first table, which has a calculated SPECmark figure based on other researchers’ work – so not really my findings, though the calculations are mine.

Numbers are: 35, 22.76, 13.41, 7.26, 3.77, 1.94, 1.01, 0.55, 0.31, 0.20 and 0.14

And (phew) that means 4 out of 11 (0.364) begin with 1 and so looks fraud free on this test.

But as I say, those are my calculations, but they are not my findings at root.

So moving on to some real findings.

First, the maximum total length of time it takes for a simulation to complete a set of benchmarks in simulated cycles: 43361267, 1428821, 1400909, 5760495, 11037274, 2072418, 145291356, 5012280.

Here a whole half are numbers that begin with 1. Hmmmm.

The mean time for the same: 42368816, 1416176, 1388897, 5646235, 10964717, 2026200, 143276995, 4733750.

Again, half begin with one.

What about the numbers within the results – in this case the amount of cycles lost due to waiting on other requests?

The numbers are: 35921194, 870627, 844281, 4364623, 1088954, 1446305, 110996151, 3420685

Now the proportion has fallen to 3/8 (0.375) and I can feel a bit more comfortable (not that the other results suggested I was following the fraudsters’ one-ninth pattern in any case.)

Later I produce numbers for an optimised system. How do they perform?

The maxima now become: 2514678, 1357224, 1316858, 3840749, 10929818, 1528350, 102077157, 3202193.

So the proportion beginning with 1 has actually risen to 5/8.

And the means show a similar pattern. Worse news with the blocks though. They now become: 730514, 775433, 735726, 2524815, 806768, 952774, 64982307, 1775537. So I am left with 1/8 (0.125) starting with 1 – something dangerously close to 1/9.

Can I save myself? I hope so … the figures above are for the first iteration of the system, but when we look at subsequent iterations a different pattern (for the blocks) emerges: 130736, 0, 0, 1612131, 97209, 232131, 64450433, 1117599.

This is now back to 3/8.

Of course, I didn’t cheat and I also suspect (I would, wouldn’t I?) that the block count is a better measure of whether I had or not – because the overall execution time of the benchmarks is, in some sense, a function of how long their execution path is and effectively that is determined – the blocking in the system is an emergent property and if I faked the whole thing I’d have less to go on and be much more likely to make it up.

Well, that’s my story and I’m sticking to it.



Another go at modelling the Labour leadership election


I started doing this for fun and that’s still my motivation – so please do not take this seriously and even if I do slip into using the word “prediction”, above all – this is not a prediction.

Anyway my aim is to model the potential outcome of the first round of the ballot of the Labour leadership election using the concrete data that we actually have – namely Labour membership data (newly disclosed to the Daily Mirror) and nominations made by all-member meetings (as reported by the @CLPNominations twitter account). The model is built using the R programming language and the code is available below.

So dealing with the assumptions made…

On membership – unlike before I now have an up-to-date figure for membership and it’s easy to look up the number of CLPs in each region/country and therefore get an average membership. But what I also do is now distribute the modelled memberships as a Gaussian (normal) distribution around this average (in layperson’s terms I assume there is a range of higher and lower memberships clustered around this average in a bell curve shape). Total arbitrarily I chose the standard deviation of this distribution (a measure of how broad the curve is) as 40% of the mean.

(Never tire of recommending this brilliant book – Statistics Without Tears – for anyone with more than a passing interest in polling and sampling.)

Why does this matter? It’s easier, if support is randomly distributed, for relatively less supported candidates to win a nomination in a smaller CLP and vice versa.

On nominations – having robustly held up for most of the week the simple Zipf model I have been using for the nominations started to creak a bit last night – essentially Lisa Nandy, despite reports of some very good performances in terms of votes won, underperformed relative to the model to the benefit of Rebecca Long-Bailey (Keir Starmer kept his proportional share steady). Emily Thornberry had a bad night.

However I am going to cheat a little bit – juggle the coefficient and the rankings – the coefficient falls to 1.1 (which might indicate that the rate at which Starmer’s lead is increasing is slowing but still means that it is increasing) and drop Nandy to 4th place (Thornberry falls to 10th). And hope that the weekend – when I expect many more nominations – makes it all a bit clearer. It’s a kludge but we aren’t taking this all that seriously, are we?

Then we can estimate that if every CLP made a nomination Starmer would take 360, Long-Bailey 172, Nandy 80 and Thornberry 29.

What we want to do is get our model of support to match (reasonably closely) this outcome – but you may have noticed I’ve had to cheat again – because many (but by no means all) of the nomination meetings have been decided by preference balloting because one candidate has not polled at least 50% + 1 vote on the first ballot. Them’s the breaks I’m afraid – modelling the preference voting requires making political decisions which go well beyond this simple maths model and that’s not my purpose here. So I am just treating all of this as though it was a first-past-the-post process.

By trial and error I have found that setting the shares of support to the figures below gives a pretty good match:

Starmer 26.75%
Long-Bailey 25.65%
Nandy 24.70%
Thornberry 22.90%

Obviously these are very tightly grouped results – and that reflects another deep flaw in the model I’m afraid – we make no allowances at all for clusters of regional support and so have to try to draw out the result from a fully random distribution – so, for instance, if we thought that Nandy or Long-Bailey could pick up a lot of nominations in their home region (the North West) then they they might be able to hit our target with lower levels of support (the same applies to Starmer in London). But that is a level of sophistication beyond this model.

The figures above typically generate a result like this:

[1] “In London, Starmer won 43, RLB won 21, Nandy won 9, Thornberry won 0”

[1] “In Scotland, Starmer won 36, RLB won 21, Nandy won 12, Thornberry won 4”

[1] “In WMids, Starmer won 29, RLB won 19, Nandy won 5, Thornberry won 6”

[1] “In EMids, Starmer won 26, RLB won 11, Nandy won 7, Thornberry won 2”

[1] “In Yorkshire, Starmer won 28, RLB won 19, Nandy won 5, Thornberry won 2”

[1] “In North, Starmer won 21, RLB won 3, Nandy won 5, Thornberry won 0”

[1] “In NWest, Starmer won 43, RLB won 20, Nandy won 11, Thornberry won 0”

[1] “In SEast, Starmer won 51, RLB won 21, Nandy won 9, Thornberry won 3”

[1] “In Eastern, Starmer won 36, RLB won 12, Nandy won 10, Thornberry won 0”

[1] “In SWest, Starmer won 35, RLB won 14, Nandy won 4, Thornberry won 2”

[1] “In Wales, Starmer won 16, RLB won 18, Nandy won 4, Thornberry won 2”

[1] “Starmer won 364, RLB won 179, Nandy won 81, Thornberry won 21

But we can go further now and look at the range of outcomes grouped around these shares – in other words use some “Monte Carlo methods” to estimate what the probabilities of certain outcomes are.

To do this here we use the ‘predicted’ shares above as the mean of a normal distribution (with standard deviation of 1% in each case. In simple terms that means that while our central case is that Starmer has the support of 26.75% of members, we might expect that in roughly one case in six he has support of less than 25.75% and in one case in six he has more than than 27.75% – and similar stipulations apply to the other candidates. We then run the simulation 1000 times and look at the distribution of outcomes.

In fact the (lazy, but it’s only for fun) way I have done this means that the variation is likely to be bigger for Long-Bailey and Nandy than for Starmer – Starmer’s support can go up and down but only moves at one end – if Starmer’s support falls and Long-Bailey’s rises she can get up to double the benefit. This is an artefact of the way I have coded this up but I will keep it because (a) I don’t want to wait another hour to finish this by re-running the code (R is great but nobody has ever suggested it is fast) and (b) Starmer is the favourite so he should feel a bit more pressure! The difference can be seen in the shape of the density curves for the candidates in the featured image for this page – Long-Bailey’s and Nandy’s are broader and shorter curves show their results taking a broader range of answers even if Starmer’s mean is well ahead.

I have assumed a 60% turnout and so we get Starmer’s maximum vote as 99,319 and his minimum as 76,685. For Long-Bailey the figures are 97,937 and 69,396, and Nandy 97,011 and 66,246. The notional results for Thornberry are 86,034 and 63,672.

Another problem…barring a big change in circumstances Emily Thornberry won’t be on the final ballot – she’s not getting anything like enough nominations – and so her supporters will have the choice of voting for someone else or just not bothering. Again this is a political question and not one for here.

This “Thornberry problem” makes what follows pretty worthless, unfortunately – certainly if it really is the case that there are 60,000 – 70,000 would-be Thornberry voters out there who will be forced to do something they would prefer not to… but here is the non-predictive prediction:
Keir Starmer has a 61.9% chance of topping the poll, Rebecca Long-Bailey has a 28.2% chance and Lisa Nandy has a 9.9% chance of winning the first ballot. Emily Thornberry does not top the poll in any of the 1000 simulations run.

Code is below – have fun with it.

#!/usr/bin/env Rscript

region<-c('London', 'Scotland', 'WMids', 'EMids', 'Yorkshire', 
	  'North', 'NWest', 'SEast', 'Eastern', 'SWest', 'Wales')
Membership<-c(115943, 20123, 39296, 34001, 50562, 27971, 
	      73250, 66183, 40943, 46530, 26894)
CLPs<-c(73, 73, 59, 46, 54, 29, 74, 84, 58, 55, 40)

sharesX<-c(0.2675, 0.524, 0.771, 1.0)
results.data<-data.frame(Starmer = integer(), RLB = integer(),
			 Nandy = integer(), Thornberry = integer(), 
			 stringsAsFactors=FALSE)

for (x in 1:1000) {

starmerShare<-rnorm(1, sharesX[1], 0.01)
rlbShare<-rnorm(1, sharesX[2], 0.01)
nandyShare<-rnorm(1, sharesX[3], 0.01)
shares<-c(starmerShare, rlbShare, nandyShare, 1.0)

starmerW<-0
rlbW<-0
nandyW<-0
etW<-0

votesStarmer<-0
votesRLB<-0
votesNandy<-0
votesET<-0

for (reg in 1:11)
{
	nameRegion<-region[reg]
	starmerC<-0
	rlbC<-0
	nandyC<-0
	etC<-0
	avMembership<-Membership[reg]/CLPs[reg]
	distMembership<-rnorm(CLPs[reg], avMembership, avMembership/2.5)
	for (p in 1:CLPs[reg])
	{
		starmerV<-0
		rlbV<-0
		nandyV<-0
		etV<-0
		for (v in 1:distMembership[p])
		{
			turnout<-runif(1)
			if (turnout > 0.6) {
				next
			}
			ans<-runif(1)
			if (ans <= shares[1]) {
				starmerV = starmerV + 1
				next
			}
			if (ans <= shares[2]) {
				rlbV = rlbV + 1
				next
			}
			if (ans <= shares[3]) {
				nandyV = nandyV + 1
				next
			}
			etV = etV + 1
		}
		votesStarmer<-votesStarmer + starmerV
		votesRLB<-votesRLB + rlbV
		votesNandy<-votesNandy + nandyV
		votesET<-votesET + etV
		if (max(starmerV, rlbV, nandyV, etV) == starmerV) {
			starmerC = starmerC + 1
			starmerW = starmerW + 1
			next
		}
		if (max(rlbV, nandyV, etV) == rlbV) {
			rlbC = rlbC + 1
			rlbW = rlbW + 1
			next
		}
		if (max(nandyV, etV) == nandyV) {
			nandyC = nandyC + 1
			nandyW = nandyW + 1
			next
		}
		etC = etC + 1
		etW = etW + 1
	}
	regionalResult<-sprintf(
	"In %s, Starmer won %i, RLB won %i, Nandy won %i, Thornberry won %i",
       	region[reg], starmerC, rlbC, nandyC, etC)
	print(regionalResult)
}
result<-sprintf(
	"Starmer won %i, RLB won %i, Nandy won %i, Thornberry won %i \n",
       	starmerW, rlbW, nandyW, etW);
print(result)
votesOutcomes<-sprintf("Starmer: %i   RLB: %i   Nandy: %i   Thornberry: %i",
		       votesStarmer, votesRLB, votesNandy, votesET)
print(votesOutcomes)
results.data<-rbind(results.data, c(votesStarmer, votesRLB, 
				    votesNandy, votesET))
}
names(results.data)=c('Starmer', 'RLB', 'Nandy', 'Thornberry')

The Sleeping Beauty Controversy


I am reading “The Best Writing on Mathematics 2018” and amongst the various fascinating articles is one on “The Sleeping Beauty Controversy” by Peter Winkler.

the problem

(Here’s a video link – which examines the problem but isn’t related directly to Peter Winkler’s piece)

To steal Professor Winkler’s own description from the article:

Sleeping Beauty agrees to the following experiment. On Sunday she is put to sleep, and a fair coin is flipped. If it comes up heads, she is awakened on Monday morning; if tails, she is awakened on Monday morning and again on Tuesday morning. In all cases, she is not told the day of the week, is put back to sleep shortly after, and will have no memory of any Monday or Tuesday awakenings.

When Sleeping Beauty is awakened on Monday or Tuesday, what – to her – is the probability that the coin came up heads?

The bulk of mathematicians are “thirders” – they think the correct answer is 1/3. But a sizable minority are “halfers” – believing the correct answer is 1/2. Still others think the question is undecidable.

For what it’s worth I am with the thirders – Sleeping Beauty would understand that if, for instance, the coin was tossed 100 times she’d be woken around 150 times and so the chances she’s being woken as a result of a head are one-in-three.

The problem restated?

But what really struck me in the article was what Professor Winkler says is a rephrasing (though admits some might disagree) of Sleeping Beauty – because this leads me to a “halfer” position:

Alice, Bob and Charlie have each taken a new sleeping pill. In hospital experiments half the subjects slept through the night… but the other half woke up once in the middle of the night then returned to sleep and woke up in the morning with no memory of the night awakening.

Alice wakes up in the middle of the night… her credence that the pill has worked drops to zero… Bob wakes up in the morning, his credence that the pill worked remains at 1/2…

Charlie, who has blackout shades in his bedroom, wakes up not knowing whether it is morning. According to the thirders, his credence in the efficacy of the pill is 1/3 until he raises the shades…

But how can this be? Charlie has no additional information beyond that he has woken up (as he would do in any case) and so surely his prior information – that the pill is 1/2 effective remains unchanged?

(Here’s another link to a different article on this question.)

“Waiting for her Phillip”by Omnimoving is licensed under CC BY-NC-SA 2.0

Correct mathematics but still strange


Imagine you had an ‘unfair’ coin that returned heads 70% of the time and tails 30% of the time and you played this game with a friend (who knows that the coin is ‘bent’):

You each put £1 in as stake money every round and then toss the coin four times.

If it comes up as four heads, you take all the money in the pot, if it comes up as two heads and two tails (in any order), your friend takes the pot. If neither comes up, you play another round putting another £1 each in the pot.

Who – if you play for long enough – has the chances of coming away with more money?

The answer – and this is what seems strange to me – is your friend – as the odds of exactly two heads and two tails are 0.2642 compared to 0.2401 for exactly four heads.

So even though on each flick the chances of the coin turning up a head are more than twice those of it turning up a tail, the odds that half of the tosses result in tails are better than them all ending in heads.

(If you played it 100 times then your friend could expect to have made a profit of £4.85)

How to win at “Deal or No Deal”


English: Publicity photo from the television s...
English: Publicity photo from the television show Let’s Make a Deal. Pictured are host Monty Hall and announcer Jay Stewart with contestants. (Photo credit: Wikipedia)

I have never actually watched “Deal or No Deal“, but I am assuming it is the same as the “Let’s Make A Deal” TV gameshow described in Ian McEwan‘s spy romp Sweet Tooth– which I have just read in the last 18 hours of plane flights, mid-air turn arounds (bad weather at the destination rather than anything more serious) and airport lounges.

The novel is voiced by a Cambridge maths graduate (and low grade MI5 spook) who at one point explains to her lover the “paradox of choice” based on “Let’s Make A Deal”:

You have three boxes, one contains a prize, the two others are empty.

You pick one and then the dealer reveals one other to be empty. Do you stick with your original choice or pick the other box?

The naive answer is to stick with your original choice. After all the odds of the prize being in the box you first picked, or any other box –  are one in three – now increased to one in two.

But that is wrong.

In fact you double your chances of winning by picking the other box.

Think of it this way. If you picked an empty box as your first choice – and the odds are that you did as there are two of them and only one with a prize – then the dealer has no choice but to reveal the other empty box and so the third box must be the one with the prize.

So the odds of you picking the prize in this way are \frac{2}{3} \times 1 = \frac{2}{3}. The odds that you picked the prize the first time round are just \frac{1}{3} .

Of course, this is not a guaranteed way to win. But you would win two times in every three games if you followed this strategy.

Update: Having now read the Wikipedia entry on “Deal or No Deal” I can see it is similar to “Let’s Make a Deal”, but not the same and, in fact, rather more complex. I am not sure the above analysis of the so-called “Monty Hall Problem” (named after the presenter of “Let’s Make A Deal”) would be much of a guide to a contestant!

More on Huffman encoding


Huffman tree generated from the exact frequenc...
Huffman tree generated from the exact frequencies in the sentence “this is an example of a huffman tree”. This is a remake from scratch of http://commons.wikimedia.org/wiki/Image:Huffman_tree.svg using Cairo. (Photo credit: Wikipedia)

As well as the basic form of Huffman coding, Huffman’s algorithm can also be used to encode streams to deliver compression as well as coding (at a cost of codes being not immediately decipherable).

Once again, with thanks to Information and Coding Theory, here’s a brief explanation.

With Huffman coding we can directly code a binary symbol stream S = \{s_1, s_2\} where s_1 = 0, s_2 = 1 .

But with encoding we can produce a code for S^n where n is arbitrary, e.g. S^3 = \{s_1s_1s_1, s_1s_1s_2, s_1s_2s_1, s_1s_2s_2, s_2s_1s_1, s_2s_1s_2, s_2s_2s_1, s_2s_2s_2 \}

So let us assume (e.g. we are transmitting a very dark image via our fax machine) that the probability of s_1 = \frac{2}{3} and for s_2 = \frac{1}{3}

Then we have the following probability distributions:

S :  \frac{8}{27} , \frac{4}{27} , \frac{4}{27} , \frac{2}{27} , \frac{4}{27} , \frac{2}{27} , \frac{2}{27} , \frac{1}{27}

S^{\prime}: \frac{8}{27}, \frac{4}{27} , \frac{2}{27} , \frac{4}{27} , \frac{2}{27} , \frac{2}{27} , \frac{3}{27}

S^{\prime\prime}: \frac{8}{27}, \frac{4}{27}, \frac{2}{27},\frac{4}{27},\frac{3}{27},\frac{4}{27}

S^{\prime\prime\prime}: \frac{8}{27},\frac{4}{27},\frac{4}{27},\frac{4}{27},\frac{5}{27}

S^{\prime\prime\prime\prime}: \frac{8}{27},\frac{4}{27},\frac{5}{27},\frac{8}{27}

S^{\prime\prime\prime\prime\prime}: \frac{8}{27},\frac{8}{27},\frac{9}{27}

S^{\prime\prime\prime\prime\prime\prime}: \frac{9}{27},\frac{16}{27}

S^{\prime\prime\prime\prime\prime\prime\prime}: 1

We can now compute the average word length in the Huffman code – by simply adding the probabilities of the newly created ‘joint’ symbols:

= \frac{3}{27} + \frac{4}{27} + \frac{5}{27} + \frac{8}{27} + \frac{9}{27} + \frac{16}{27} + 1 = \frac{72}{27}

Which obviously looks like the opposite of compression! But, of course we are encoding three symbols, so the actual word length per symbol becomes \frac{72}{81} = 0.88888888888...., in other words somewhat less than 1.

Why isn’t the universe of infinite density?


Brian Greene at the World Science Festival lau...
Brian Greene at the World Science Festival launch press conference (Photo credit: Wikipedia)

Another speculation produced by Brian Greene’s The Hidden Reality: Parallel Universes and the Deep Laws of the Cosmos.

Imagine the universe was infinite (along the “quilted multiverse” pattern – namely that it streched on and on and we could only see a part). That would imply, assuming that the “cosmological principle” that one bit of the universe looked like any other, applied, that there were an infinite number of hydrogen atoms out there.

So, why is the universe not of infinite density? Because surely Shroedinger’s Equation means that there is a finite probability that electrons could be in any given region of space? (Doesn’t it?)

For any given electron the probability in “most” regions of space is zero in any measurable sense. But if there are an infinite number of electrons then the probability at a given point that there is an electron there is infinite, isn’t it?

OK, I have obviously got something wrong here because nobody is dismissing the “quilted multiverse” idea so simply – but could someone explain what it is I have got wrong?

Update: Is this because space-time is a continuum and the number of electrons a countable infinity?

How fair is my dime, part 2


Continuing my Bayesian experiment with a dime coin. Part one is here.

So, let’s look at the posterior (outcome) probability density function after the next two tosses. First a head:

Two heads, two tails pdfNow we can see the results near the extreme possibilities (ie H= 1, H = 0) being eliminated completely. Interestingly, although, in reality, the evidence (two heads, two tails) points towards a fair coin, ie H = 0.5, the pdf here peaks with H below 0.5, reflecting our earlier ‘estimate’ of a coin biased towards tails.

Now, another tail:

2 heads, 3 tails pdfThe peak is becoming narrower now, with some suggestion that there is a bias to tails, but of course with the coin having only been tossed 5 times, some such “bias” is unavoidable.

Now a head:

Six tosses, three heads, three tails pdfAnd again…

7 tosses pdfAnd again…

pdf after 8 tossesAnd the ninth toss:

pdf after 9 tossesHere the peak is becoming pronounced (and it also looks like the splines used to create the graph are showing signs of Runge’s phenomenon).

And the final, tenth, toss:

pdf after 10 tossesThis does suggest a bias but the pdf is still quite broad – so there is lots of room for further correction. Indeed the thing that strikes me most is how little the suggested bias is after five heads in a row.