Ten Great Ideas About Chance: A Review


Unfortunately Ten Great Ideas About Chance is a disappointment.

The central idea of the book is to look at ten key mathematical-philosophical ideas in probability and, using the history of the idea, explain what they are about and why they matter.

It’s not that the book doesn’t have some very interesting material, but it fails to hit its target over and over again and, unfortunately, even contains some obvious and – presumably – not so obvious errors.

This review states it so much better than I can, so here is an extract:

The chapters are invariably a mix of
1. a trivial example that does not penetrate enough the intended topic because it contains too much of the familiar and too little of the topic that’s being introduced
2. references to original texts that are impenetrable nineteenth century translations into English from eighteenth century originals written in French or German or Latin
3. statements of complex results that would take fifty pages to arrive at if the proofs were shown
4. cheerleading

So what I re-lived by reading this book is my Freshman Year nightmare math class where three times a week I’d follow the first five minutes of the lecture only to subsequently find myself furiously copying from the board so I can read my lecture notes later at home and try to make sense of them.

Bayes and the billiard balls: and what’s the probability the Sun rises tomorrow?


This is again from Ten Great Ideas About Chance.

Some years ago I wrote about measuring how fair, or otherwise, a dime coin might be, applying Bayes’s theorem, which requires us to start with a view on the fairness of the coin (the so-called ‘prior’).

Then, as is typical in many such example cases, I made no assumption about fairness and so thought it was equally probable that the coin was completely biased to heads, to tails, and to everything in between.

Bayes, it seems, made a similar initial assumption and used a very appealing analogy to explain this – with billiard balls.

Take N + 1 billiard balls and run them across a snooker table from left to right – assuming you have applied even slightly different forces to them they will end up in different positions.

So if you pick one of the balls how many are to the right or to the left of the picked ball? If we there are M balls to the right of our picked ball we can also model this as how many heads (M) we get out of N coin tosses. So if we pick the rightmost ball then M=0 , the second from right M=1 and if we pick the leftmost then M=N. But for a given M if we pick our ball at random the odds must be:

\frac{1}{N + 1}

Which, of course, gives us a flat line as a prior.

And what about the odds on the Sun rising in the morning?

Laplace’s rule of succession, which flows from Bayes’s rule is:

 \frac{M+1}{N+2}

(Where there have been N tests and M successes).

Now, if we assume recorded history began in approximately 6000BC and there have been no recorded instances of the Sun failing to rise in the 8000 years this encompasses: that means there have been 2,920,000 tests, all of which have been successes.

So, very approximately, we can assign the odds of 0.9999996575 to the Sun rising tomorrow or, to put it another way, the bookies might want to offer you 3000000 – 1 against or thereabouts that it won’t happen.

What’s the evidence your numbers are faked?


Currently reading: Ten Great Ideas About Chance.

It can be a bit of a frustrating read – a lot of the chapters about gambling and judgement appear to be poorly explained to me – especially if you’ve ever actually placed a bet on anything (I am far from a regular gambler – I’m not sure I’ve placed a bet on anything in the last decade but I still know how it works).

But it also has lots of exceptionally interesting stuff: not least the revelation that naturally occurring number sets tend to begin with 1 with a proportion of 0.3 whilst fraudsters (including election fixers and presumably scientific experiment fixers) tend to use such numbers 0.111… (1/9) of the time.

So, taking my life in my hands – what about results in my PhD – (read thesis here). I am actually calculating this as I go along – so I really do hope it doesn’t suggest fraud!

Anyway – first table, which has a calculated SPECmark figure based on other researchers’ work – so not really my findings, though the calculations are mine.

Numbers are: 35, 22.76, 13.41, 7.26, 3.77, 1.94, 1.01, 0.55, 0.31, 0.20 and 0.14

And (phew) that means 4 out of 11 (0.364) begin with 1 and so looks fraud free on this test.

But as I say, those are my calculations, but they are not my findings at root.

So moving on to some real findings.

First, the maximum total length of time it takes for a simulation to complete a set of benchmarks in simulated cycles: 43361267, 1428821, 1400909, 5760495, 11037274, 2072418, 145291356, 5012280.

Here a whole half are numbers that begin with 1. Hmmmm.

The mean time for the same: 42368816, 1416176, 1388897, 5646235, 10964717, 2026200, 143276995, 4733750.

Again, half begin with one.

What about the numbers within the results – in this case the amount of cycles lost due to waiting on other requests?

The numbers are: 35921194, 870627, 844281, 4364623, 1088954, 1446305, 110996151, 3420685

Now the proportion has fallen to 3/8 (0.375) and I can feel a bit more comfortable (not that the other results suggested I was following the fraudsters’ one-ninth pattern in any case.)

Later I produce numbers for an optimised system. How do they perform?

The maxima now become: 2514678, 1357224, 1316858, 3840749, 10929818, 1528350, 102077157, 3202193.

So the proportion beginning with 1 has actually risen to 5/8.

And the means show a similar pattern. Worse news with the blocks though. They now become: 730514, 775433, 735726, 2524815, 806768, 952774, 64982307, 1775537. So I am left with 1/8 (0.125) starting with 1 – something dangerously close to 1/9.

Can I save myself? I hope so … the figures above are for the first iteration of the system, but when we look at subsequent iterations a different pattern (for the blocks) emerges: 130736, 0, 0, 1612131, 97209, 232131, 64450433, 1117599.

This is now back to 3/8.

Of course, I didn’t cheat and I also suspect (I would, wouldn’t I?) that the block count is a better measure of whether I had or not – because the overall execution time of the benchmarks is, in some sense, a function of how long their execution path is and effectively that is determined – the blocking in the system is an emergent property and if I faked the whole thing I’d have less to go on and be much more likely to make it up.

Well, that’s my story and I’m sticking to it.



Labour leadership model update


The Labour Party leadership nomination process is now at a mature stage – 485 local Labour parties have made nominations and so there are probably less than 100 left to go.

The pattern of those nominations is pretty clear – a big lead for Keir Starmer (currently backed by 280 local parties) with Rebecca Long-Bailey having just under half his nominations (131) and Lisa Nandy just under half as many again (56) with Emily Thornberry in a poor fourth position with 18 and generally thought unlikely to meet the threshold of 33 to progress into the main ballot.

The discussion here will be about what mathematical modelling based on the nominations might be able to tell us about the outcome of that ballot rather than the politics of the contest.

My initial thought was that a Zipf model might fit with the nominations – here outcomes are proportional to the inverse of their rank raised to a power. So the first candidate’s results are proportional to:

\frac{N}{1^R}

And second placed candidate’s results are proportional to:

\frac{N}{2^R}

And so on where N is a constant and R a coefficient. In fact this model worked well for a while but Nandy’s (and particularly Thornberry’s) under-performance have seen it break down, so while R has tended to towards just under 1.1 for Long-Bailey, it’s about 1.45 for Nandy and close to 2 for Thornberry.

With no obvious rule, and with Thornberry in particularly falling further behind, I have relied on heuristics to get what looks like a good model for the outcome if all 645 CLPs in Britain nominated: currently that is Starmer 372, Long-Bailey 175, Nandy 75 and Thornberry 24.

From that I try to find a share of the vote that would match this outcome – and here I have to assume that all candidates win their CLP ballots on the first round of voting. (That obviously isn’t true but I don’t have any substantial data to account for preference voting and so have no choice.)

To do this I assume that supporters for the different candidates are randomly distributed around the country but that only a smallish number (a mean of 6%) of them come to meetings. Then the candidate with the most support is likely to win more meetings but the luck of the draw means other candidates can win too. The closer the support for candidates is, the more likely underdogs can win votes and so the point is to find a share that, when tested against the randomly generated outcome of 645 meetings, gives a result closest to my projection.

Currently that is Starmer 31.3%, Long-Bailey 27.1%, Nandy 23.8% and Thornberry 17.8%. The figures at the bottom end tend to gyrate – it’s so hard for Thornberry to win that a small change in her numbers appears to generate a big change in share up or down – but at the top, between Starmer and Long-Bailey, the roughly 31 v. 27 pattern has been more or less stable over the last week and more.

This isn’t a perfect fit – the range of options over the four dimensional “hyperspace” of the different candidates’ support is enormous, but it is the best fit the computer software has found after 1000 (guided) guesses.

The guesses employ “Monte Carlo methods” – vary the parameters randomly (technically stochastically) around a central figure and see if the fit (as measured by the squared error) is better or worse than the current estimate. If the fit is better then we use those figures as the basis of the next guess and so on.

From this – if I have the time (as it takes 8 or so hours to run) I can then run a big Monte Carlo simulation of the election itself. Here, instead of using a 6% average for turnout I use 78% (the highest for recent Labour leadership elections) but again randomly vary that for each CLP. I also randomly vary the support each candidate might expect to win in each CLP to account for local factors (eg the candidate opened your jumble sale a while back and so people know and like them).

This simulation runs 7000 times and allows me to (like a weather forecaster saying there is a 90% chance of rain) give some estimates for the uncertainty in the outcome. Last time it was run – on Saturday – the uncertainty was tiny: Starmer could expect to win the first ballot 99.4% of the time and Nandy – who previously had a very small (0.5%) chance of coming second, was always coming third.

How reliable is this? Well, it’s a projection, not a prediction: the eligibility criteria for the main ballot are different (and not all the voters will be Labour members). It doesn’t account for preference voting and it doesn’t account for regional support (e.g., Long-Bailey has done well in the North West and while this boosts her nomination numbers it may also skew the estimate of her support higher than reality). Nobody really thinks that Emily Thornberry is winning 18% support either – it looks more likely that her team has targeted some local parties to win nominations.

But all that said I don’t think it’s a bad model either – earlier opinion polls suggested Nandy’s support was in the single figure percents and the nominations (and the model) show that is wrong/outdated. The core message – that Starmer has a small but solid and stable lead – seems very right.

Updated Labour leadership model


I have updated the model in several ways – to make it a slightly better analogue of the real world and to follow the developments in the contest itself.

No beating about the bush – the predictions for the outcome of the first round of balloting (as I say here please don’t take this seriously) are:

Starmer has a 99.9% chance of topping the first ballot with 30.6% support.
Long-Bailey has a 0.1% chance of topping the first ballot with 26.3% support.
Nandy cannot top the ballot, but with 23.9% support she has a 5.1% chance of beating Long-Bailey (and so coming second at this point).
There isn’t really any good news for Thornberry on 19.2% support.

As a reminder: the support shares are the central case – we tested 7000 examples where we stochastically (or randomly in simple terms) alter the variables around this central case (a so-called ‘Monte Carlo simulation‘).

The core improvements are:

  1. We stochastically alter the support for each candidate on a constituency by constituency basis against the central case – we think this is more likely to reflect the real world where there will be pools of support or distrust for each candidate locally – eg because they recently spoke at a meeting and so on.
  2. We assume a much lower turnout for the nomination meetings than before – this makes it easier for more marginal candidates to win them – so the central case is a 6% turnout but again there is a stochastic variation around that with an upper and lower limit.
  3. We have a better fitting procedure for the core vote share – before I was just plugging values in and picking an answer that looked like a good fit. Now I still do that but then subject that guess to a Monte Carlo method test to see if we can find a better fit by exploring values in what is in effect the four-dimensional hyperspace of this problem – looking to minimise the squared difference error.

Code is below:

#!/usr/bin/env Rscript

region<-c('London', 'Scotland', 'WMids', 'EMids', 'Yorkshire', 
	  'North', 'NWest', 'SEast', 'Eastern', 'SWest', 'Wales')
Membership<-c(115943, 20123, 39296, 34001, 50562, 27971, 
	      73250, 66183, 40943, 46530, 26894)
CLPs<-c(73, 73, 59, 46, 54, 29, 74, 84, 58, 55, 40)

sharesX<-c(0.306, 0.263, 0.239, 0.192)
results.data<-data.frame(Starmer = integer(), RLB = integer(),
			 Nandy = integer(), Thornberry = integer(), 
			 stringsAsFactors=FALSE)

for (x in 1:7000) {
starmerShare<-rnorm(1, sharesX[1], 0.01)
rlbShare<-rnorm(1, sharesX[2], 0.01)
nandyShare<-rnorm(1, sharesX[3], 0.01)
thornberryShare<-rnorm(1, sharesX[4], 0.01)
shares<-c(starmerShare, starmerShare + rlbShare, 
	  starmerShare + rlbShare + nandyShare, 
	  starmerShare + rlbShare + nandyShare + thornberryShare)

starmerW<-0
rlbW<-0
nandyW<-0
etW<-0

votesStarmer<-0
votesRLB<-0
votesNandy<-0
votesET<-0

for (reg in 1:11)
{
	nameRegion<-region[reg]
	starmerC<-0
	rlbC<-0
	nandyC<-0
	etC<-0
	avMembership<-Membership[reg]/CLPs[reg]
	distMembership<-rnorm(CLPs[reg], avMembership, avMembership/2.5)
	for (p in 1:CLPs[reg])
	{
		localStarmer<-rnorm(1, starmerShare, 0.05)
		if (localStarmer < 0.02) {
			localStarmer<-0.02
		}
		localRLB<-rnorm(1, rlbShare, 0.05) 
		if (localRLB < 0.02) {
			
			localRLB<-0.02
		}
		localNandy<-rnorm(1, nandyShare, 0.05) 
		if (localNandy< 0.02) {
			localNandy<-0.02
		}
		localThornberry<-rnorm(1, thornberryShare, 0.05)
		if (localThornberry < 0.02) {
			localThornberry<-0.02
		}
		localShares<-c(localStarmer, localStarmer + localRLB,
			localStarmer + localRLB + localNandy,
			localStarmer + localRLB + localNandy + localThornberry)
		supportNorm<-localStarmer + localRLB + 
			localNandy + localThornberry
		turnoutThreshold<-rnorm(1, 0.6, 0.1)
		if (turnoutThreshold > 0.9) {
			turnoutThreshold <- 0.9
		}
		if (turnoutThreshold < 0.3 ) {
			turnoutThreshold <- 0.3
		}
		if (turnoutThreshold * distMembership[p] < 30)
		{
			turnoutThreshold <- 30/distMembership[p]
		}
		starmerV<-0
		rlbV<-0
		nandyV<-0
		etV<-0
		for (v in 1:distMembership[p])
		{
			turnout<-runif(1)
			if (turnout > turnoutThreshold) {
				next
			}
			ans<-runif(1) * supportNorm
			if (ans <= localShares[1]) {
				starmerV = starmerV + 1
				next
			}
			if (ans <= localShares[2]) {
				rlbV = rlbV + 1
				next
			}
			if (ans <= localShares[3]) {
				nandyV = nandyV + 1
				next
			}
			etV = etV + 1
		}
		votesStarmer<-votesStarmer + starmerV
		votesRLB<-votesRLB + rlbV
		votesNandy<-votesNandy + nandyV
		votesET<-votesET + etV
		if (max(starmerV, rlbV, nandyV, etV) == starmerV) {
			starmerC = starmerC + 1
			starmerW = starmerW + 1
			next
		}
		if (max(rlbV, nandyV, etV) == rlbV) {
			rlbC = rlbC + 1
			rlbW = rlbW + 1
			next
		}
		if (max(nandyV, etV) == nandyV) {
			nandyC = nandyC + 1
			nandyW = nandyW + 1
			next
		}
		etC = etC + 1
		etW = etW + 1
	}
	regionalResult<-sprintf(
	"In %s, Starmer won %i, RLB won %i, Nandy won %i, Thornberry won %i",
       	region[reg], starmerC, rlbC, nandyC, etC)
	print(regionalResult)
}
result<-sprintf(
	"Starmer won %i, RLB won %i, Nandy won %i, Thornberry won %i \n",
       	starmerW, rlbW, nandyW, etW);
print(result)
votesOutcomes<-sprintf("Starmer: %i   RLB: %i   Nandy: %i   Thornberry: %i",
		       votesStarmer, votesRLB, votesNandy, votesET)
print(x)
print(votesOutcomes)
results.data<-rbind(results.data, c(votesStarmer, votesRLB, 
				    votesNandy, votesET))
}
names(results.data)=c('Starmer', 'RLB', 'Nandy', 'Thornberry')