Mathematically modelling the overall Labour result

The Zipf model I outlined here looks to be reasonably robust – though maybe the coefficient needs to drop to somewhere between 1.25 and 1.29 – but can we use this result to draw any conclusions on the actual result itself?

That’s what I am going to try to do here – but be warned there are a whole host of assumptions in here and this isn’t really anything other than a mathematical diversion.

The idea is this: if supporters of any given candidate are randomly distributed across all Constituency Labour Parties (dubious – discuss) and we make certain assumptions about the sizes of Constituency Labour Parties, what level of support tends to generate the sort of results for nomination meetings that we are seeing.

On the size of the 11 Labour party regions and countries we also assume a Zipf distribution and so work on a basis that 339,306 members vote and that in the biggest region (nominally London, but we’re not basing this on real membership figures for London, just using a simple model) that means 120,000 voters and 9223 in the smallest region. These figures decline using a coefficient of 1.07 over the rank of the ‘region’ (1.07 is the figure seen across the globe for major national cities rank).

Each one of these notional regions has 56 CLPs which range in size from 2143 voters for the biggest to 165 at the smallest.

The target we are trying to hit is the Zipf prediction (for a notional 616 nominations) of Starmer 358 nominations, Long-Bailey 145 nominations, Nandy 86 nominations and Thornberry 25 nominations.

OK, you’ve heard all the blah – here’s the bit you really came for – what does it say about support. Well, it’s sort of good news for Keir Starmer who, this model suggests, is getting about 27% support. Rebecca Long-Bailey is picking up 25.5% so is close behind, but Lisa Nandy is not far off either at 25.0%, while Emily Thornberry has 22.5%. On a typical run (as the process is random the precise number varies) this gives Starmer 335 nominations, Long-Bailey 154 nominations, Nandy 101 nominations and Thornberry 26 – the precise figures don’t matter so much beyond showing that it’s close.

Now, YouGov’s poll – which I’d trust much more than my prognostications – had very different figures, with Starmer on 46% first preferences and Long-Bailey on 32%.

So why the difference and why do I trust the poll more than this model?

Firstly and most importantly because support for candidates isn’t randomly distributed – I reason Long-Bailey and Nandy are likely to have disproportionally more supporters in the North West and Starmer in London – and there are many more members in London.

And secondly, because, as I’ve already said, the model makes far too many assumptions.

On the other hand – I do think Nandy has been doing better than the initial polling suggested so this model is probably right to suggest she’s doing relatively well.

Code (in R) used is shown below… but the bottom line is: this guess isn’t likely to be a very good one.

#!/usr/bin/env Rscript

clpSize<-c(2142, 1021, 661, 486, 383, 315, 267, 232, 204, 182, 165)
shares<-c(0.275, 0.525, 0.775, 1.0)

starmerW<-0
rlbW<-0
nandyW<-0
etW<-0

for (reg in 1:11)
{
	starmerC<-0
	rlbC<-0
	nandyC<-0
	etC<-0
	for (p in 1:56)
	{
		starmerV<-0
		rlbV<-0
		nandyV<-0
		etV<-0
		for (v in 1:clpSize[reg])
		{
			ans<-runif(1)
			if (ans <= shares[1]) {
				starmerV = starmerV + 1
				next
			}
			if (ans <= shares[2]) {
				rlbV = rlbV + 1
				next
			}
			if (ans <= shares[3]) {
				nandyV = nandyV + 1
				next
			}
			etV = etV + 1
		}
		if (max(starmerV, rlbV, nandyV, etV) == starmerV) {
			starmerC = starmerC + 1
			starmerW = starmerW + 1
			next
		}
		if (max(rlbV, nandyV, etV) == rlbV) {
			rlbC = rlbC + 1
			rlbW = rlbW + 1
			next
		}
		if (max(nandyV, etV) == nandyV) {
			nandyC = nandyC + 1
			nandyW = nandyW + 1
			next
		}
		etC = etC + 1
		etW = etW + 1
	}
}
result<-sprintf("Starmer won %i, RLB won %i, Nandy won %i, Thornberry won %i \n", starmerW, rlbW, nandyW, etW);
print(result)