Updated Labour leadership model

I have updated the model in several ways – to make it a slightly better analogue of the real world and to follow the developments in the contest itself.

No beating about the bush – the predictions for the outcome of the first round of balloting (as I say here please don’t take this seriously) are:

Starmer has a 99.9% chance of topping the first ballot with 30.6% support.
Long-Bailey has a 0.1% chance of topping the first ballot with 26.3% support.
Nandy cannot top the ballot, but with 23.9% support she has a 5.1% chance of beating Long-Bailey (and so coming second at this point).
There isn’t really any good news for Thornberry on 19.2% support.

As a reminder: the support shares are the central case – we tested 7000 examples where we stochastically (or randomly in simple terms) alter the variables around this central case (a so-called ‘Monte Carlo simulation‘).

The core improvements are:

  1. We stochastically alter the support for each candidate on a constituency by constituency basis against the central case – we think this is more likely to reflect the real world where there will be pools of support or distrust for each candidate locally – eg because they recently spoke at a meeting and so on.
  2. We assume a much lower turnout for the nomination meetings than before – this makes it easier for more marginal candidates to win them – so the central case is a 6% turnout but again there is a stochastic variation around that with an upper and lower limit.
  3. We have a better fitting procedure for the core vote share – before I was just plugging values in and picking an answer that looked like a good fit. Now I still do that but then subject that guess to a Monte Carlo method test to see if we can find a better fit by exploring values in what is in effect the four-dimensional hyperspace of this problem – looking to minimise the squared difference error.

Code is below:

#!/usr/bin/env Rscript

region<-c('London', 'Scotland', 'WMids', 'EMids', 'Yorkshire', 
	  'North', 'NWest', 'SEast', 'Eastern', 'SWest', 'Wales')
Membership<-c(115943, 20123, 39296, 34001, 50562, 27971, 
	      73250, 66183, 40943, 46530, 26894)
CLPs<-c(73, 73, 59, 46, 54, 29, 74, 84, 58, 55, 40)

sharesX<-c(0.306, 0.263, 0.239, 0.192)
results.data<-data.frame(Starmer = integer(), RLB = integer(),
			 Nandy = integer(), Thornberry = integer(), 
			 stringsAsFactors=FALSE)

for (x in 1:7000) {
starmerShare<-rnorm(1, sharesX[1], 0.01)
rlbShare<-rnorm(1, sharesX[2], 0.01)
nandyShare<-rnorm(1, sharesX[3], 0.01)
thornberryShare<-rnorm(1, sharesX[4], 0.01)
shares<-c(starmerShare, starmerShare + rlbShare, 
	  starmerShare + rlbShare + nandyShare, 
	  starmerShare + rlbShare + nandyShare + thornberryShare)

starmerW<-0
rlbW<-0
nandyW<-0
etW<-0

votesStarmer<-0
votesRLB<-0
votesNandy<-0
votesET<-0

for (reg in 1:11)
{
	nameRegion<-region[reg]
	starmerC<-0
	rlbC<-0
	nandyC<-0
	etC<-0
	avMembership<-Membership[reg]/CLPs[reg]
	distMembership<-rnorm(CLPs[reg], avMembership, avMembership/2.5)
	for (p in 1:CLPs[reg])
	{
		localStarmer<-rnorm(1, starmerShare, 0.05)
		if (localStarmer < 0.02) {
			localStarmer<-0.02
		}
		localRLB<-rnorm(1, rlbShare, 0.05) 
		if (localRLB < 0.02) {
			
			localRLB<-0.02
		}
		localNandy<-rnorm(1, nandyShare, 0.05) 
		if (localNandy< 0.02) {
			localNandy<-0.02
		}
		localThornberry<-rnorm(1, thornberryShare, 0.05)
		if (localThornberry < 0.02) {
			localThornberry<-0.02
		}
		localShares<-c(localStarmer, localStarmer + localRLB,
			localStarmer + localRLB + localNandy,
			localStarmer + localRLB + localNandy + localThornberry)
		supportNorm<-localStarmer + localRLB + 
			localNandy + localThornberry
		turnoutThreshold<-rnorm(1, 0.6, 0.1)
		if (turnoutThreshold > 0.9) {
			turnoutThreshold <- 0.9
		}
		if (turnoutThreshold < 0.3 ) {
			turnoutThreshold <- 0.3
		}
		if (turnoutThreshold * distMembership[p] < 30)
		{
			turnoutThreshold <- 30/distMembership[p]
		}
		starmerV<-0
		rlbV<-0
		nandyV<-0
		etV<-0
		for (v in 1:distMembership[p])
		{
			turnout<-runif(1)
			if (turnout > turnoutThreshold) {
				next
			}
			ans<-runif(1) * supportNorm
			if (ans <= localShares[1]) {
				starmerV = starmerV + 1
				next
			}
			if (ans <= localShares[2]) {
				rlbV = rlbV + 1
				next
			}
			if (ans <= localShares[3]) {
				nandyV = nandyV + 1
				next
			}
			etV = etV + 1
		}
		votesStarmer<-votesStarmer + starmerV
		votesRLB<-votesRLB + rlbV
		votesNandy<-votesNandy + nandyV
		votesET<-votesET + etV
		if (max(starmerV, rlbV, nandyV, etV) == starmerV) {
			starmerC = starmerC + 1
			starmerW = starmerW + 1
			next
		}
		if (max(rlbV, nandyV, etV) == rlbV) {
			rlbC = rlbC + 1
			rlbW = rlbW + 1
			next
		}
		if (max(nandyV, etV) == nandyV) {
			nandyC = nandyC + 1
			nandyW = nandyW + 1
			next
		}
		etC = etC + 1
		etW = etW + 1
	}
	regionalResult<-sprintf(
	"In %s, Starmer won %i, RLB won %i, Nandy won %i, Thornberry won %i",
       	region[reg], starmerC, rlbC, nandyC, etC)
	print(regionalResult)
}
result<-sprintf(
	"Starmer won %i, RLB won %i, Nandy won %i, Thornberry won %i \n",
       	starmerW, rlbW, nandyW, etW);
print(result)
votesOutcomes<-sprintf("Starmer: %i   RLB: %i   Nandy: %i   Thornberry: %i",
		       votesStarmer, votesRLB, votesNandy, votesET)
print(x)
print(votesOutcomes)
results.data<-rbind(results.data, c(votesStarmer, votesRLB, 
				    votesNandy, votesET))
}
names(results.data)=c('Starmer', 'RLB', 'Nandy', 'Thornberry')