Modelling the US Presidential Election in R


In many ways this is a lot easier than modelling the British Labour Party leadership election – we only care about two candidates, Biden and Trump, and they capture the vast majority of the votes, so I feel reasonably confident about my first simplification: ignoring the votes of all others and simply focusing on the Democrats and the Republicans.

The next simplifying assumption, though, is the big one and one for which the case is disputed: that a US-wide poll can allow us to predict state-by-state results. What I don’t do is simply plug a nationwide opinion poll into a uniform swing model, though.

Rather than just use such a simple arithmetic system I instead simulate the election 3,000,000 times – and each time randomly (scholastically) vary the inputs:

  • First of all I vary the Biden/Democrat score to simulate sampling error in the poll – assuming samples are normally distributed around the reported number and that 95% of all samples will give a result +/- 3% of the reported number. (As we are ignoring all third party effects I simply add the inverse of this variation to the Republican score).
  • Secondly, state by state, I repeat this process but with a bigger range – instead of a standard deviation of about 1.5% for the poll, I use 2% for the state.
  • Thirdly – and this is entirely subjective – I make an estimate of the long-term trend in a given state and use this as a way of adding further randomness: e.g. if I assume that there has been a ‘core’ 0.5% shift to the Democrats then an additional random factor is added using 0.005 as the standard deviation (in theory, of course, this could be a negative, though that is unlikely).
  • Then – to calculate the likely outcome I look at the probabilities of the different totals of Electoral Votes across all three million simulations.

The states I think are swinging to the Democrats (all 0.5% swings unless stated otherwise) are:

Alabama, Alaska, Arizona, California, Colorado (0.2%), Delaware, DC (0.01%), Florida, Georgia, Hawaii (0.2%), Idaho, Illinois, Kansas, Louisiana, District 1 Maine (0.1%), Maryland, Massachusetts, Mississippi, Montana (0.2%), Nebraska (0.2%), District 2 Nebraska, Nevada (0.1%), New Mexico (0.2%), New York (0.1%), North Carolina, Oklahoma (0.1%), Oregon, South Carolina (0.1%), Texas, Utah, Vermont, Virginia.

Swinging to the Repubicans (again 0.5% unless stated otherwise):

Arkansas (0.1%), Connecticut (0.2%), Indiana, Iowa, Kentucky (0.2%), Maine, District 2 Maine, Michigan, Minnesota, Missouri, District 1 Nebraska (0.1%), District 3 Nebraska (0.1%), New Hampshire (0.2%), New Jersey (0.1%), North Dakota, Ohio, Pennsylvania, Rhode Island, South Dakota, Tennessee (0.2%), West Virginia, Wisconsin, Wyoming.

(Obviously a lot of these swings make no real difference because the state is locked up for one side or the other anyway.)

So these are the sort of results generated (based on YouGov America poll published on 30 August which had Biden at 47% and Trump at 41%) – as you can see there is a massive amount of spurious precision:

Alabama : Biden has 0.0005333333 % chance of winning
Alaska : Biden has 2.1635 % chance of winning
Arizona : Biden has 62.34907 % chance of winning
Arkansas : Biden has 0.0003333333 % chance of winning
California : Biden has 100 % chance of winning
Colorado : Biden has 98.54147 % chance of winning
Connecticut : Biden has 99.985 % chance of winning
Delaware : Biden has 99.96987 % chance of winning
DC : Biden has 100 % chance of winning
Florida : Biden has 75.8858 % chance of winning
Georgia : Biden has 47.0369 % chance of winning
Hawaii : Biden has 100 % chance of winning
Idaho : Biden has 0 % chance of winning
Illinois : Biden has 99.9998 % chance of winning
Indiana : Biden has 0.07203333 % chance of winning
Iowa : Biden has 5.133933 % chance of winning
Kansas : Biden has 0.07423333 % chance of winning
Kentucky : Biden has 6.666667e-05 % chance of winning
Louisiana : Biden has 0.2554 % chance of winning
Maine : Biden has 93.11677 % chance of winning
Maine1 : Biden has 99.99787 % chance of winning
Maine2 : Biden has 10.71063 % chance of winning
Maryland : Biden has 100 % chance of winning
Massachusetts : Biden has 100 % chance of winning
Michigan : Biden has 76.00387 % chance of winning
Minnesota : Biden has 81.57777 % chance of winning
Mississippi : Biden has 0.8029667 % chance of winning
Missouri : Biden has 0.0724 % chance of winning
Montana : Biden has 0.155 % chance of winning
Nebraska : Biden has 0 % chance of winning
Nebraska1 : Biden has 0.0053 % chance of winning
Nebraska2 : Biden has 75.95797 % chance of winning
Nebraska3 : Biden has 0 % chance of winning
Nevada : Biden has 91.20317 % chance of winning
New_Hampshire : Biden has 79.82397 % chance of winning
New_Jersey : Biden has 99.98613 % chance of winning
New_Mexico : Biden has 99.8471 % chance of winning
New_York : Biden has 99.99997 % chance of winning
North_Carolina : Biden has 62.3821 % chance of winning
North_Dakota : Biden has 0 % chance of winning
Ohio : Biden has 19.68823 % chance of winning
Oklahoma : Biden has 0 % chance of winning
Oregon : Biden has 99.97153 % chance of winning
Pennsylvania : Biden has 62.3612 % chance of winning
Rhode_Island : Biden has 99.99227 % chance of winning
South_Carolina : Biden has 3.431833 % chance of winning
South_Dakota : Biden has 0 % chance of winning
Tennessee : Biden has 3.333333e-05 % chance of winning
Texas : Biden has 19.67437 % chance of winning
Utah : Biden has 0.0032 % chance of winning
Vermont : Biden has 100 % chance of winning
Virginia : Biden has 98.81823 % chance of winning
Washington : Biden has 99.99973 % chance of winning
West_Virginia : Biden has 0 % chance of winning
Wisconsin : Biden has 62.35563 % chance of winning
Wyoming : Biden has 0 % chance of winning

In terms of Electoral Votes we have this:

Here the red vertical line is the winning post of 270 EVs and this shows Biden has a slightly greater than 80% chance of getting there.

What does that mean? Well, you can think of it like the weather forecast (indeed the methodologies are similar) – if the Met Office said there was an 80% chance of it raining, would you wear a rain coat?

The blue line gives us an expectation for the number of EVs Biden will win – in this case just under 310.

Here’s the (slightly scrappy) code – unlike the Labour Party code I used earlier in the year I am making proper use of R’s vectorisation capabilities, so although I am running a lot of simulations it only takes a few seconds. A github repo (with the baseline data) will follow in due course.

(If you want to know more about statistics I cannot recommend this book too highly.)




#!/usr/bin/env Rscript

#library("ggplot2")

#args<-commandArgs("trailingOnly=TRUE")

samples<-3000000
dem<-0.47
rep<-0.41
route270<-data.frame(State=as.character(0), EVs=as.integer(0), Chance=as.double(0), stringsAsFactors=FALSE)
theScore<-data.frame(EVs=as.integer(0))

dem16<-0.511
rep16<-0.489

total2p<-(dem+rep)
corDem<-(dem/total2p)
corRep<-(rep/total2p)

swing<-corDem - dem16

us2016<-read.csv(file='US.csv', stringsAsFactors = FALSE)

demDiff<-dem-rnorm(samples, dem, 0.03/1.96)
repDiff<--demDiff


#state odds
for (i in 1:nrow(us2016))
{
  stateDiff<-rnorm(samples, 0, 0.02)
  #generate additional factor
  reverse <- 1
  trendFactor<-us2016[i, ]$Trend
  localDemDiff<-rnorm(samples, trendFactor/100, abs(trendFactor)/100)
  localRepDiff = -localDemDiff
  demProjection<-us2016[i,]$D16 + swing * 100 + demDiff * 100 + localDemDiff * 100 + stateDiff * 100
  repProjection<-us2016[i,]$R16 - swing * 100 + repDiff * 100 + localRepDiff * 100 - stateDiff * 100
  demVote<-us2016[i,]$Turnout * demProjection/100
  repVote<-us2016[i,]$Turnout * repProjection/100
  demVictoryMargin<-demVote - repVote
  demWin<-(demVictoryMargin > 0) * us2016[i,]$EVs
  theScore<-cbind(theScore, demWin)
  z<-sum(demVictoryMargin > 0)
  if (z > 0) {
    route270<-rbind(route270, c(as.character(us2016[i,]$State), as.integer(us2016[i,]$EVs), as.double(z/samples)))
  }
  cat(as.character(us2016[i,]$State),": Biden has ", z/(samples/100), "% chance of winning\n")
}
route270<-route270[-1,]
theScore<-theScore[-1,]
answers<-rowSums(theScore)
rr<-ecdf(answers)
plot(rr, main="Biden electoral votes", xlab="Electoral votes", ylab="Cumulative probability")
abline(v=270, col="red")
abline(h=0.5, col="blue")