Labour leadership model update

The Labour Party leadership nomination process is now at a mature stage – 485 local Labour parties have made nominations and so there are probably less than 100 left to go.

The pattern of those nominations is pretty clear – a big lead for Keir Starmer (currently backed by 280 local parties) with Rebecca Long-Bailey having just under half his nominations (131) and Lisa Nandy just under half as many again (56) with Emily Thornberry in a poor fourth position with 18 and generally thought unlikely to meet the threshold of 33 to progress into the main ballot.

The discussion here will be about what mathematical modelling based on the nominations might be able to tell us about the outcome of that ballot rather than the politics of the contest.

My initial thought was that a Zipf model might fit with the nominations – here outcomes are proportional to the inverse of their rank raised to a power. So the first candidate’s results are proportional to:

\frac{N}{1^R}

And second placed candidate’s results are proportional to:

\frac{N}{2^R}

And so on where N is a constant and R a coefficient. In fact this model worked well for a while but Nandy’s (and particularly Thornberry’s) under-performance have seen it break down, so while R has tended to towards just under 1.1 for Long-Bailey, it’s about 1.45 for Nandy and close to 2 for Thornberry.

With no obvious rule, and with Thornberry in particularly falling further behind, I have relied on heuristics to get what looks like a good model for the outcome if all 645 CLPs in Britain nominated: currently that is Starmer 372, Long-Bailey 175, Nandy 75 and Thornberry 24.

From that I try to find a share of the vote that would match this outcome – and here I have to assume that all candidates win their CLP ballots on the first round of voting. (That obviously isn’t true but I don’t have any substantial data to account for preference voting and so have no choice.)

To do this I assume that supporters for the different candidates are randomly distributed around the country but that only a smallish number (a mean of 6%) of them come to meetings. Then the candidate with the most support is likely to win more meetings but the luck of the draw means other candidates can win too. The closer the support for candidates is, the more likely underdogs can win votes and so the point is to find a share that, when tested against the randomly generated outcome of 645 meetings, gives a result closest to my projection.

Currently that is Starmer 31.3%, Long-Bailey 27.1%, Nandy 23.8% and Thornberry 17.8%. The figures at the bottom end tend to gyrate – it’s so hard for Thornberry to win that a small change in her numbers appears to generate a big change in share up or down – but at the top, between Starmer and Long-Bailey, the roughly 31 v. 27 pattern has been more or less stable over the last week and more.

This isn’t a perfect fit – the range of options over the four dimensional “hyperspace” of the different candidates’ support is enormous, but it is the best fit the computer software has found after 1000 (guided) guesses.

The guesses employ “Monte Carlo methods” – vary the parameters randomly (technically stochastically) around a central figure and see if the fit (as measured by the squared error) is better or worse than the current estimate. If the fit is better then we use those figures as the basis of the next guess and so on.

From this – if I have the time (as it takes 8 or so hours to run) I can then run a big Monte Carlo simulation of the election itself. Here, instead of using a 6% average for turnout I use 78% (the highest for recent Labour leadership elections) but again randomly vary that for each CLP. I also randomly vary the support each candidate might expect to win in each CLP to account for local factors (eg the candidate opened your jumble sale a while back and so people know and like them).

This simulation runs 7000 times and allows me to (like a weather forecaster saying there is a 90% chance of rain) give some estimates for the uncertainty in the outcome. Last time it was run – on Saturday – the uncertainty was tiny: Starmer could expect to win the first ballot 99.4% of the time and Nandy – who previously had a very small (0.5%) chance of coming second, was always coming third.

How reliable is this? Well, it’s a projection, not a prediction: the eligibility criteria for the main ballot are different (and not all the voters will be Labour members). It doesn’t account for preference voting and it doesn’t account for regional support (e.g., Long-Bailey has done well in the North West and while this boosts her nomination numbers it may also skew the estimate of her support higher than reality). Nobody really thinks that Emily Thornberry is winning 18% support either – it looks more likely that her team has targeted some local parties to win nominations.

But all that said I don’t think it’s a bad model either – earlier opinion polls suggested Nandy’s support was in the single figure percents and the nominations (and the model) show that is wrong/outdated. The core message – that Starmer has a small but solid and stable lead – seems very right.