In many ways this is a lot easier than modelling the British Labour Party leadership election – we only care about two candidates, Biden and Trump, and they capture the vast majority of the votes, so I feel reasonably confident about my first simplification: ignoring the votes of all others and simply focusing on the Democrats and the Republicans.

The next simplifying assumption, though, is the big one and one for which the case is disputed: that a US-wide poll can allow us to predict state-by-state results. What I don’t do is simply plug a nationwide opinion poll into a uniform swing model, though.

Rather than just use such a simple arithmetic system I instead simulate the election 3,000,000 times – and each time randomly (scholastically) vary the inputs:

- First of all I vary the Biden/Democrat score to simulate sampling error in the poll – assuming samples are normally distributed around the reported number and that 95% of all samples will give a result +/- 3% of the reported number. (As we are ignoring all third party effects I simply add the inverse of this variation to the Republican score).
- Secondly, state by state, I repeat this process but with a bigger range – instead of a standard deviation of about 1.5% for the poll, I use 2% for the state.
- Thirdly – and this is entirely subjective – I make an estimate of the long-term trend in a given state and use this as a way of adding further randomness: e.g. if I assume that there has been a ‘core’ 0.5% shift to the Democrats then an additional random factor is added using 0.005 as the standard deviation (in theory, of course, this could be a negative, though that is unlikely).
- Then – to calculate the likely outcome I look at the probabilities of the different totals of Electoral Votes across all three million simulations.

The states I think are swinging to the Democrats (all 0.5% swings unless stated otherwise) are:

Alabama, Alaska, Arizona, California, Colorado (0.2%), Delaware, DC (0.01%), Florida, Georgia, Hawaii (0.2%), Idaho, Illinois, Kansas, Louisiana, District 1 Maine (0.1%), Maryland, Massachusetts, Mississippi, Montana (0.2%), Nebraska (0.2%), District 2 Nebraska, Nevada (0.1%), New Mexico (0.2%), New York (0.1%), North Carolina, Oklahoma (0.1%), Oregon, South Carolina (0.1%), Texas, Utah, Vermont, Virginia.

Swinging to the Repubicans (again 0.5% unless stated otherwise):

Arkansas (0.1%), Connecticut (0.2%), Indiana, Iowa, Kentucky (0.2%), Maine, District 2 Maine, Michigan, Minnesota, Missouri, District 1 Nebraska (0.1%), District 3 Nebraska (0.1%), New Hampshire (0.2%), New Jersey (0.1%), North Dakota, Ohio, Pennsylvania, Rhode Island, South Dakota, Tennessee (0.2%), West Virginia, Wisconsin, Wyoming.

(Obviously a lot of these swings make no real difference because the state is locked up for one side or the other anyway.)

So these are the sort of results generated (based on YouGov America poll published on 30 August which had Biden at 47% and Trump at 41%) – as you can see there is a massive amount of spurious precision:

Alabama : Biden has 0.0005333333 % chance of winning

Alaska : Biden has 2.1635 % chance of winning

Arizona : Biden has 62.34907 % chance of winning

Arkansas : Biden has 0.0003333333 % chance of winning

California : Biden has 100 % chance of winning

Colorado : Biden has 98.54147 % chance of winning

Connecticut : Biden has 99.985 % chance of winning

Delaware : Biden has 99.96987 % chance of winning

DC : Biden has 100 % chance of winning

Florida : Biden has 75.8858 % chance of winning

Georgia : Biden has 47.0369 % chance of winning

Hawaii : Biden has 100 % chance of winning

Idaho : Biden has 0 % chance of winning

Illinois : Biden has 99.9998 % chance of winning

Indiana : Biden has 0.07203333 % chance of winning

Iowa : Biden has 5.133933 % chance of winning

Kansas : Biden has 0.07423333 % chance of winning

Kentucky : Biden has 6.666667e-05 % chance of winning

Louisiana : Biden has 0.2554 % chance of winning

Maine : Biden has 93.11677 % chance of winning

Maine1 : Biden has 99.99787 % chance of winning

Maine2 : Biden has 10.71063 % chance of winning

Maryland : Biden has 100 % chance of winning

Massachusetts : Biden has 100 % chance of winning

Michigan : Biden has 76.00387 % chance of winning

Minnesota : Biden has 81.57777 % chance of winning

Mississippi : Biden has 0.8029667 % chance of winning

Missouri : Biden has 0.0724 % chance of winning

Montana : Biden has 0.155 % chance of winning

Nebraska : Biden has 0 % chance of winning

Nebraska1 : Biden has 0.0053 % chance of winning

Nebraska2 : Biden has 75.95797 % chance of winning

Nebraska3 : Biden has 0 % chance of winning

Nevada : Biden has 91.20317 % chance of winning

New_Hampshire : Biden has 79.82397 % chance of winning

New_Jersey : Biden has 99.98613 % chance of winning

New_Mexico : Biden has 99.8471 % chance of winning

New_York : Biden has 99.99997 % chance of winning

North_Carolina : Biden has 62.3821 % chance of winning

North_Dakota : Biden has 0 % chance of winning

Ohio : Biden has 19.68823 % chance of winning

Oklahoma : Biden has 0 % chance of winning

Oregon : Biden has 99.97153 % chance of winning

Pennsylvania : Biden has 62.3612 % chance of winning

Rhode_Island : Biden has 99.99227 % chance of winning

South_Carolina : Biden has 3.431833 % chance of winning

South_Dakota : Biden has 0 % chance of winning

Tennessee : Biden has 3.333333e-05 % chance of winning

Texas : Biden has 19.67437 % chance of winning

Utah : Biden has 0.0032 % chance of winning

Vermont : Biden has 100 % chance of winning

Virginia : Biden has 98.81823 % chance of winning

Washington : Biden has 99.99973 % chance of winning

West_Virginia : Biden has 0 % chance of winning

Wisconsin : Biden has 62.35563 % chance of winning

Wyoming : Biden has 0 % chance of winning

In terms of Electoral Votes we have this:

Here the red vertical line is the winning post of 270 EVs and this shows Biden has a slightly greater than 80% chance of getting there.

What does that mean? Well, you can think of it like the weather forecast (indeed the methodologies are similar) – if the Met Office said there was an 80% chance of it raining, would you wear a rain coat?

The blue line gives us an expectation for the number of EVs Biden will win – in this case just under 310.

Here’s the (slightly scrappy) code – unlike the Labour Party code I used earlier in the year I am making proper use of R’s vectorisation capabilities, so although I am running a lot of simulations it only takes a few seconds. A github repo (with the baseline data) will follow in due course.

(If you want to know more about statistics I cannot recommend this book too highly.)

```
#!/usr/bin/env Rscript
#library("ggplot2")
#args<-commandArgs("trailingOnly=TRUE")
samples<-3000000
dem<-0.47
rep<-0.41
route270<-data.frame(State=as.character(0), EVs=as.integer(0), Chance=as.double(0), stringsAsFactors=FALSE)
theScore<-data.frame(EVs=as.integer(0))
dem16<-0.511
rep16<-0.489
total2p<-(dem+rep)
corDem<-(dem/total2p)
corRep<-(rep/total2p)
swing<-corDem - dem16
us2016<-read.csv(file='US.csv', stringsAsFactors = FALSE)
demDiff<-dem-rnorm(samples, dem, 0.03/1.96)
repDiff<--demDiff
#state odds
for (i in 1:nrow(us2016))
{
stateDiff<-rnorm(samples, 0, 0.02)
#generate additional factor
reverse <- 1
trendFactor<-us2016[i, ]$Trend
localDemDiff<-rnorm(samples, trendFactor/100, abs(trendFactor)/100)
localRepDiff = -localDemDiff
demProjection<-us2016[i,]$D16 + swing * 100 + demDiff * 100 + localDemDiff * 100 + stateDiff * 100
repProjection<-us2016[i,]$R16 - swing * 100 + repDiff * 100 + localRepDiff * 100 - stateDiff * 100
demVote<-us2016[i,]$Turnout * demProjection/100
repVote<-us2016[i,]$Turnout * repProjection/100
demVictoryMargin<-demVote - repVote
demWin<-(demVictoryMargin > 0) * us2016[i,]$EVs
theScore<-cbind(theScore, demWin)
z<-sum(demVictoryMargin > 0)
if (z > 0) {
route270<-rbind(route270, c(as.character(us2016[i,]$State), as.integer(us2016[i,]$EVs), as.double(z/samples)))
}
cat(as.character(us2016[i,]$State),": Biden has ", z/(samples/100), "% chance of winning\n")
}
route270<-route270[-1,]
theScore<-theScore[-1,]
answers<-rowSums(theScore)
rr<-ecdf(answers)
plot(rr, main="Biden electoral votes", xlab="Electoral votes", ylab="Cumulative probability")
abline(v=270, col="red")
abline(h=0.5, col="blue")
```

## 2 responses to “Modelling the US Presidential Election in R”

Lol.. this looks remarkably like the famous NYT graph that gave Clinton an 80+ percent chance of winning in 2016. ( https://notrickszone.com/wp-content/uploads/2016/11/NYT-Chances.png )

You can’t model true sentiment in advance, only apparent or inferred sentiment, the latter of which is the downfall of most polls because it’s subject to so much personal bias.

Thanks for the comment, but actually the charts don’t look alike in any way and nor are they charting the same thing.

“Most polls” aren’t failures at all – but of course you are right – if people systematically lie to pollsters then polls, no matter how well done, will be wrong. But I think that given, in general, polls are good predictors of political outcomes world-wide, there is good reason to think they are a good basis on which to speculate on the outcome of the election.

But, of course, the chart shown here is not an election result and it’s also based on a poll taken more than two months before election day. There will be more polls and they will show different results.