Modelling the US Presidential Election in R

Poll 30/8

In many ways this is a lot easier than modelling the British Labour Party leadership election – we only care about two candidates, Biden and Trump, and they capture the vast majority of the votes, so I feel reasonably confident about my first simplification: ignoring the votes of all others and simply focusing on the Democrats and the Republicans.

The next simplifying assumption, though, is the big one and one for which the case is disputed: that a US-wide poll can allow us to predict state-by-state results. What I don’t do is simply plug a nationwide opinion poll into a uniform swing model, though.

Rather than just use such a simple arithmetic system I instead simulate the election 3,000,000 times – and each time randomly (scholastically) vary the inputs:

  • First of all I vary the Biden/Democrat score to simulate sampling error in the poll – assuming samples are normally distributed around the reported number and that 95% of all samples will give a result +/- 3% of the reported number. (As we are ignoring all third party effects I simply add the inverse of this variation to the Republican score).
  • Secondly, state by state, I repeat this process but with a bigger range – instead of a standard deviation of about 1.5% for the poll, I use 2% for the state.
  • Thirdly – and this is entirely subjective – I make an estimate of the long-term trend in a given state and use this as a way of adding further randomness: e.g. if I assume that there has been a ‘core’ 0.5% shift to the Democrats then an additional random factor is added using 0.005 as the standard deviation (in theory, of course, this could be a negative, though that is unlikely).
  • Then – to calculate the likely outcome I look at the probabilities of the different totals of Electoral Votes across all three million simulations.

The states I think are swinging to the Democrats (all 0.5% swings unless stated otherwise) are:

Alabama, Alaska, Arizona, California, Colorado (0.2%), Delaware, DC (0.01%), Florida, Georgia, Hawaii (0.2%), Idaho, Illinois, Kansas, Louisiana, District 1 Maine (0.1%), Maryland, Massachusetts, Mississippi, Montana (0.2%), Nebraska (0.2%), District 2 Nebraska, Nevada (0.1%), New Mexico (0.2%), New York (0.1%), North Carolina, Oklahoma (0.1%), Oregon, South Carolina (0.1%), Texas, Utah, Vermont, Virginia.

Swinging to the Repubicans (again 0.5% unless stated otherwise):

Arkansas (0.1%), Connecticut (0.2%), Indiana, Iowa, Kentucky (0.2%), Maine, District 2 Maine, Michigan, Minnesota, Missouri, District 1 Nebraska (0.1%), District 3 Nebraska (0.1%), New Hampshire (0.2%), New Jersey (0.1%), North Dakota, Ohio, Pennsylvania, Rhode Island, South Dakota, Tennessee (0.2%), West Virginia, Wisconsin, Wyoming.

(Obviously a lot of these swings make no real difference because the state is locked up for one side or the other anyway.)

So these are the sort of results generated (based on YouGov America poll published on 30 August which had Biden at 47% and Trump at 41%) – as you can see there is a massive amount of spurious precision:

Alabama : Biden has 0.0005333333 % chance of winning
Alaska : Biden has 2.1635 % chance of winning
Arizona : Biden has 62.34907 % chance of winning
Arkansas : Biden has 0.0003333333 % chance of winning
California : Biden has 100 % chance of winning
Colorado : Biden has 98.54147 % chance of winning
Connecticut : Biden has 99.985 % chance of winning
Delaware : Biden has 99.96987 % chance of winning
DC : Biden has 100 % chance of winning
Florida : Biden has 75.8858 % chance of winning
Georgia : Biden has 47.0369 % chance of winning
Hawaii : Biden has 100 % chance of winning
Idaho : Biden has 0 % chance of winning
Illinois : Biden has 99.9998 % chance of winning
Indiana : Biden has 0.07203333 % chance of winning
Iowa : Biden has 5.133933 % chance of winning
Kansas : Biden has 0.07423333 % chance of winning
Kentucky : Biden has 6.666667e-05 % chance of winning
Louisiana : Biden has 0.2554 % chance of winning
Maine : Biden has 93.11677 % chance of winning
Maine1 : Biden has 99.99787 % chance of winning
Maine2 : Biden has 10.71063 % chance of winning
Maryland : Biden has 100 % chance of winning
Massachusetts : Biden has 100 % chance of winning
Michigan : Biden has 76.00387 % chance of winning
Minnesota : Biden has 81.57777 % chance of winning
Mississippi : Biden has 0.8029667 % chance of winning
Missouri : Biden has 0.0724 % chance of winning
Montana : Biden has 0.155 % chance of winning
Nebraska : Biden has 0 % chance of winning
Nebraska1 : Biden has 0.0053 % chance of winning
Nebraska2 : Biden has 75.95797 % chance of winning
Nebraska3 : Biden has 0 % chance of winning
Nevada : Biden has 91.20317 % chance of winning
New_Hampshire : Biden has 79.82397 % chance of winning
New_Jersey : Biden has 99.98613 % chance of winning
New_Mexico : Biden has 99.8471 % chance of winning
New_York : Biden has 99.99997 % chance of winning
North_Carolina : Biden has 62.3821 % chance of winning
North_Dakota : Biden has 0 % chance of winning
Ohio : Biden has 19.68823 % chance of winning
Oklahoma : Biden has 0 % chance of winning
Oregon : Biden has 99.97153 % chance of winning
Pennsylvania : Biden has 62.3612 % chance of winning
Rhode_Island : Biden has 99.99227 % chance of winning
South_Carolina : Biden has 3.431833 % chance of winning
South_Dakota : Biden has 0 % chance of winning
Tennessee : Biden has 3.333333e-05 % chance of winning
Texas : Biden has 19.67437 % chance of winning
Utah : Biden has 0.0032 % chance of winning
Vermont : Biden has 100 % chance of winning
Virginia : Biden has 98.81823 % chance of winning
Washington : Biden has 99.99973 % chance of winning
West_Virginia : Biden has 0 % chance of winning
Wisconsin : Biden has 62.35563 % chance of winning
Wyoming : Biden has 0 % chance of winning

In terms of Electoral Votes we have this:

Here the red vertical line is the winning post of 270 EVs and this shows Biden has a slightly greater than 80% chance of getting there.

What does that mean? Well, you can think of it like the weather forecast (indeed the methodologies are similar) – if the Met Office said there was an 80% chance of it raining, would you wear a rain coat?

The blue line gives us an expectation for the number of EVs Biden will win – in this case just under 310.

Here’s the (slightly scrappy) code – unlike the Labour Party code I used earlier in the year I am making proper use of R’s vectorisation capabilities, so although I am running a lot of simulations it only takes a few seconds. A github repo (with the baseline data) will follow in due course.

(If you want to know more about statistics I cannot recommend this book too highly.)

#!/usr/bin/env Rscript



route270<-data.frame(State=as.character(0), EVs=as.integer(0), Chance=as.double(0), stringsAsFactors=FALSE)



swing<-corDem - dem16

us2016<-read.csv(file='US.csv', stringsAsFactors = FALSE)

demDiff<-dem-rnorm(samples, dem, 0.03/1.96)

#state odds
for (i in 1:nrow(us2016))
  stateDiff<-rnorm(samples, 0, 0.02)
  #generate additional factor
  reverse <- 1
  trendFactor<-us2016[i, ]$Trend
  localDemDiff<-rnorm(samples, trendFactor/100, abs(trendFactor)/100)
  localRepDiff = -localDemDiff
  demProjection<-us2016[i,]$D16 + swing * 100 + demDiff * 100 + localDemDiff * 100 + stateDiff * 100
  repProjection<-us2016[i,]$R16 - swing * 100 + repDiff * 100 + localRepDiff * 100 - stateDiff * 100
  demVote<-us2016[i,]$Turnout * demProjection/100
  repVote<-us2016[i,]$Turnout * repProjection/100
  demVictoryMargin<-demVote - repVote
  demWin<-(demVictoryMargin > 0) * us2016[i,]$EVs
  theScore<-cbind(theScore, demWin)
  z<-sum(demVictoryMargin > 0)
  if (z > 0) {
    route270<-rbind(route270, c(as.character(us2016[i,]$State), as.integer(us2016[i,]$EVs), as.double(z/samples)))
  cat(as.character(us2016[i,]$State),": Biden has ", z/(samples/100), "% chance of winning\n")
plot(rr, main="Biden electoral votes", xlab="Electoral votes", ylab="Cumulative probability")
abline(v=270, col="red")
abline(h=0.5, col="blue")


2 responses to “Modelling the US Presidential Election in R”

  1. Lol.. this looks remarkably like the famous NYT graph that gave Clinton an 80+ percent chance of winning in 2016. ( )

    You can’t model true sentiment in advance, only apparent or inferred sentiment, the latter of which is the downfall of most polls because it’s subject to so much personal bias.

    1. Thanks for the comment, but actually the charts don’t look alike in any way and nor are they charting the same thing.
      “Most polls” aren’t failures at all – but of course you are right – if people systematically lie to pollsters then polls, no matter how well done, will be wrong. But I think that given, in general, polls are good predictors of political outcomes world-wide, there is good reason to think they are a good basis on which to speculate on the outcome of the election.
      But, of course, the chart shown here is not an election result and it’s also based on a poll taken more than two months before election day. There will be more polls and they will show different results.

%d bloggers like this: