Dear Analyst #36: What The Economist’s model for the 2020 presidential election can teach us about forecasting

On a recent episode of The Intelligence, The data editor at The Economist spoke about a U.S. presidential election forecast their publication is working on. I looked more into their model and discuss some of the features and parameters of their model and what makes their forecast unique. Some of the techniques used in The Economist‘s model can be used with your own forecasting use cases. To see a summary of The Economist‘s model, see this page. Learn more about how the model works on this page.

Source: The Economist

Key takeaways and a caveat

The model utilizes machine learning and multiple data sources and it’s easy to get caught up in the details. Here are the key takeaways as described by Dan Rosenhack, the data editor at The Economist:

  1. Machine learning is used to create equations to predict the 2020 presidential outcome
  2. Early polls are not as reliable early on in the election cycle
  3. Partisan non-response bias can result in a supporter being more likely or unlikely to respond to a pollster when there is extremely good or bad news about that supporter’s party or candidate

A caveat: The Economist‘s model and the various forecasting techniques they use are definitely outside of my knowledge and skillset. Most of this episode is me learning more about the model and interpreting some of the results. You don’t have to be a statistics programmer or data science professional to appreciate what the data team has done at The Economist. If you are working with data in any capacity, pushing yourself to learn about subjects that push your comfort zone will only make you more knowledgable about the data analysis process.

Fundamentals vs. early polling

One key finding from the model is that polls conducted in the first half of the year during the election cycle are a pretty weak predictor of results. On the other hand, fundamental measures like the president’s approval rating, GDP growth, and whether there is an incumbent running for re-election are much better predictors. This chart shows the difference between poll results and fundamentals for predicting the outcome in 1992:

Source: The Economist

The model primarily relies on these fundamental indicators, but over time the polls become a better indicator for predicting the outcome. In the last week leading up the election in November, more weight is applied to the polls than the fundamentals.

This visualization below shows that early polls tend to overestimate a party’s share of the vote (in this case the Democratic share) compared to fundamental indicators. As you get closer to election day, however, the polls start to become a better predictor:

Source: The Economist

Overfitting data

One downside The Economist points out with other models that try to forecast the presidential election is that equations are created that overfit to historical data points. Think about it: if you tried to create an equation to predict who would win the NBA championship in 2020 based on 1990s data, you may create an equation that leans heavily to the Bulls. Unfortunately, Michael Jordan isn’t playing anymore and the 2020 NBA season is now being played in a bubble in Orlando.

Had to mention Jordan somewhere in this post 🙂

The Economist utilizes machine learning to better predict the outcome of the presidential election and utilizes two techniques which I’ll try to explain in layman’s terms from reading the post:

  1. Elastic-net regularisation – Simplify the equation you’re using to predict the outcome
  2. Leave-one-out-cross-validation – Split your data into pieces and apply the machine learning to each piece to predict outcomes

#2 is a pretty common technique I’ve seen used in finance. Take actual results and see if you can predict what would’ve happened if you applied your forecast to last quarter or last year.

In the context of the presidential election, let’s say the model is trying to predict what the outcome of the 1948 election would’ve been (the incumbent Harry Truman defeated Thomas Dewey). The training model is done on all the other years of data except for 1948. Then use the learnings from these other years to see which model was best at predicting the outcome in 1948.

State polling

The model also looks at state-level polling data. What’s interesting about the state model is how it uses demographic data like population density and the share of voters that are white evangelical Christians to determine how similar two states are in terms of voter preferences:

Source: The Economist

In the visualization above, Wisconsin is more similar to Ohio than Nevada is to Ohio.

A note about non-partisan response bias

I’ve never heard this term before and think the way the team is accounting for this bias in their model makes the model more accurate and unique. They take polling data from major sources like ABC and The Washington Post and track the changes in poll results over time. This means they can account for any irregularities in the data so that large swings in opinion due to news about a candidate don’t impact the model too much.

Looking at the us-potus-model repo

One visualization that caught my eye in the source code The Economist released is this one showing the model results vs. the polls vs. actuals from the 2008, 2012, and 2016 elections. Notice how in 2008 and 2012 the variability between the model, prior, and result are much closer together than in 2016? Just shows the level of uncertainty that went into the 2016 prediction.

2008

2012

2016

Speaking of uncertainty, I like this commit message as the team was refining the model back in March

We have chronic uncertainty.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting: