How the RacetotheWH Presidential Forecast Works - Methodology

Designed by: Logan Phillips

I. Intro - The Broad Overview

I designed the 2024 Presidential Forecast from the ground up for this election cycle. It’s a data-driven forecast that takes into account the latest polling, each state’s recent electoral history, economy trends over the last two years, among other factors, to predict the outcome for every state in the nation. It then utilizes the predictions for every state in a simulation of the electoral college, which simulates the election 50,000 times each day.

From the outset, I worked to ensure the forecast was deeply rooted in electoral history. It was built utilizing election data going back to the 1890s, economic data going back to the 1960s, and was tested relentlessly on every election from 1972 to 2020. Every feature and element of the forecast has been proven to be predictive in past elections, and the overall forecast has been fine-tuned to be as predictive as possible. At the same time, I’ve tried to ensure that the forecast is nimble enough to emphasize recent trends in polling over the last twenty years. I refrained from including elements that may have been predictive before 2000, but no longer improved the forecast’s accuracy in recent election cycles.

On the website, I present the findings of the forecast with interactive graphics that are designed to show viewers what’s going on in the race in a way that’s easy to understand. My goal is to make it easy for people that aren’t following politics too closely to understand the shape of the election, while also providing detailed information for those that follow the race closely and want to dive deep into the details.

A. How the Forecast Works – the Short Version

Before I go into detail on each of the key parts of the forecast, here is a broad overview of how the election forecast works.

The first step of my forecast is to project the national popular vote for the election. This is driven primarily by the national polling, and economic data over the last two years. The forecast translates the economic data into a projection for the popular vote. It also considers two secondary factors: special elections results since the 2022 midterms and fundraising data from both candidates.

The national popular vote is then used for the second part of the forecast – projecting the lead Kamala Harris or Donald Trump will have in every state. Using the past eight years of election data, I estimate how much each state is likely to vote to the left or right of the national vote. This is combined with state polling to make a forecast for each state. For both the predictions for the state and national popular vote, polling becomes more influential the closer we get to election day.

Third, the forecast must predict how much it can expect the actual result will deviate from its prediction, based on the last 50 years of elections. In other words, how much uncertainty do we have about this prediction? The lower the uncertainty, the lower the odds of an upset. I combine the projection lead for every state with the estimated uncertainty to make a prediction on the chance Kamala Harris and Donald Trump have of winning each state.

Finally, I plug the results from the forecast into the election simulator. It runs through variations of the election based on parties randomly over/underperforming in different ways, ranging from the national vote, performance in individual states, performance in regions (like the Midwest), and performance with certain demographic groups (like Latino voters).

Each day, it tracks how often Harris and Trump win the election across the 50,000 simulations, as well as their average electoral vote count. It also tracks how they performed under certain conditions, so we can see how often candidates win without winning the popular vote, or how frequently they match past strong election margins, like President Obama’s 2008 landslide, or President Reagan’s 1984 domination.

Finally, the results from the simulations are published each day so that RacetotheWH viewers can see the latest prediction, and track how the forecast has changed over time.

B. Primary Goals in Building The Forecast:

I had two primary goals when rebuilding the forecasting model for this election. First, I wanted to ensure the forecast was more resilient to polling misses. I think this is critically important, as in the last two presidential election cycles, we saw enormous polling errors in the Midwest and Pennsylvania. In 2020, while the polls correctly pointed to a Biden win, they suggested a landslide, instead of a white-knuckle victory where he won most of the swing states, but by far smaller margins than it appeared entering election day. With declining response rates to polls, we must be prepared for the possibility of another polling miss – potentially in either direction.

Making a forecast more resilient to another big polling miss is far easier in a Senate and House forecast, where we can rely on how incumbents did in their last election, fundraising, and the electoral experience of candidates. These are less useful metrics in a Presidential forecast, and so I had to learn some new skills and implement new strategies.

I spent months building a new economic forecast, which takes economic data and turns it into a projection for the popular vote. Based on my testing of past elections, I believe this has dramatically improved my forecast model’s ability to project the national environment. It also factors in the results from Special Elections, Fundraising, and a new feature called Partisan drift that helps improve the accuracy of predictions for states that are quickly moving in one party’s direction. All of this is explained in much more detail below.

My second goal was to go much further into electoral history. I used election data going back to the 1890s and economic data going back to the 1960s. I tested every single piece of my model rigorously by running it on every presidential election since 1972 in all 50 states, Washington D.C., and the five congressional districts assigning individual electoral votes. Across 12 presidential elections, I had 713 state/districts to test the forecast on.

I used this data to fine-tune how much weight each component of the forecast should have, and to test whether new features I wanted to add would actually be predictive. tested many ideas, but discarded those without sufficient evidence, such as including the age gap between presidential candidates or using Senate and Governor races to predict state voting patterns in the next cycle.

I put just as much effort into determining the odds of an upset as I do in projecting the margin of victory in each state. Having data from so many elections gave me a wealth of data to test what type of signals are the most predictive of a higher upset chance in a state.

This provides an overview of how the Forecast works. If you are interested in understanding it in more detail, read on. If not, follow this link and see it in action.

II. How the Forecast Works – the Detailed Version

A. Predicting the Popular Vote

As Hillary Clinton and Al Gore can attest, leading in the national vote does not guarantee victory if a candidate fails to win the right combination of states to reach 270 electoral votes. Nonetheless, predicting the popular vote is a crucial part of the Presidential Forecast, as it offers valuable insight into how each state may vote.

To predict the leader in the national popular vote, I factor in two types of data. The first is the national polling.

The second type is the fundamentals, which include the economic forecast, special election results since the last midterm election, and the campaign fundraising totals for Kamala Harris and Donald Trump.

1. National Polling

The forecast tracks the net lead Kamala Harris or Donald Trump has in the national polls, sourced directly from the RacetotheWH polling average. It utilizes the “All Polls” version, which includes both head-to-head polling and polling that includes third party candidates. It only includes polls conducted entirely after Joe Biden exited the race on July 21st.

The polling average is weighted to prioritize recent polls, those conducted by pollsters with strong track records, and those with large sample sizes.

Here are a few other factors our polling average considers:

  • If a polling firm has released multiple polls, we prioritize their most recent one, and severely reduce the weight of the past polls they conducted.

  • When a pollster releases both a head-to-head poll and one that includes third-party candidates, both are included in the average, but each is weighted at half the value of a standard poll.

  • We adjust the polls for pollster’s historic bias. We are particularly aggressive at correcting bias for partisan pollsters.

The same process is used for the state polling average.

2. The Economic Forecast

Polling is an imperfect snapshot in time of how voters felt last week. Voters can and often do change their minds, especially in presidential elections, so a presidential model can be led astray if it uses polling alone. In testing the model back to 1972, the economic forecast dramatically improved the forecast’s ability to predict the national vote. Eighty days before the election, with polling alone, the forecast misses by 5.57% on average – but that improves to just 3.15% with the economic forecast factored in. The average miss decreases to an average of 2.1% by election day and to just 1.7% in elections since 2000.

How I designed the Economic Forecast:

The economic forecast looks at how the economy has improved or gotten worse on 10 different data metrics, compared to where the US economy was one year ago. The stronger the economy, the better the incumbent President’s party is expected to do.

Here are the 10 indicators, listed in order of their weight in the Economic Forecast:

  1. Manufacturing Sales (Real Manufacturing and Trade Industries Sales)

  2. GDP Per Capita

  3. New Privately-Owned Housing Units Under Construction

  4. Consumer Sentiment (University of Michigan)

  5. Personal Income (Average Hourly Earnings of Production and Nonsupervisory Employees, Total Private)

  6. Total Industrial Production

  7. Jobs (All Employees, Total Nonfarm)

  8. Spending (Personal Consumption Expenditures)

  9. Inflation (Consumer Price Index: All Items: Total for United States)

  10. The Stock Market (NASDAQ Composite Index)

The Economic Forecast looks at the overall shift on a month-by-month basis over the last two years, with more weight for the most recent months.

For each metric, I developed a unique formula that would be as predictive as possible of the popular vote. I did this while testing it on every election since 1972, so I could immediately see how every shift in the formula would change the results. Then, I found the combination of factors that would be the most accurate.

The Economic Forecast proved to be just as predictive in open races without an incumbent as it was when a President was running for re-election. As a result, the economic forecast carries the same weight whether Kamala Harris or President Biden is the nominee. The Economic Forecast will continue to be updated with the latest data throughout the election.The Economic Forecast looks at the overall shift on a month by month basis over the last two years, with the most recent month getting the most weight.

3. Performance in Special Elections

The results of special elections since the last midterm election provide insight into the type of national environment we will have in 2024. I take the Partisan Lean of every district with a special election, whether it be a State Senate, State House, Governor, Congressional or Senate race. The Partisan Lean is calculated by comparing how much the district voted to the left or right of the national vote in the last two presidential elections.

Second, I compare the vote in the special election to its partisan lean. For example, if a state senate district had a D+1% partisan lean and Democrats won the special election by 5%, this would suggest a D+4% national environment.

One election alone is not predictive, but when we combine all special elections since 2022, we can get a valuable early read on the election. This is a weighted average, with more weight given to recent elections. Special elections by themselves only have so much predictive value, but in conjunction with other factors, they can improve the accuracy of the forecast.The results of special elections since the last midterm election provide insight into the type of national environment we will have in 2024.

To a degree, I made a calculated risk to include special elections in the forecast. I was only able to calculate the partisan lean of state senate and state house districts for 2012 forward, so I was not able to test it on prior presidential cycles. Nevertheless, in those elections, it significantly enhanced the model’s ability to predict the popular vote. I found this to be true in predicting the popular vote in midterm elections as well. In fact, in the 2022 election cycle, the RacetotheWH House forecast took Democrats outstanding performance in the special elections since 2020 as a signal that Democrats were likely to outperform both generic ballot polling and historic expectations for the Presidents party in the midterms. This was a key reason the House forecast came just one seat shy of perfectly predicting the number of seats the GOP won.

Due to the limited historical data, Special Elections account for only 13.5% of the 'Fundamentals' in the popular vote prediction. Recent elections suggest this percentage may be conservative, but I'm cautious due to the limited historical data available.

4. Fundraising

Fundraising is the final component in projecting the national vote. Ever since the Supreme Court’s ruling in Citizens United v. FEC striking down campaign finance regulations, our political system is awash in dark money that is hard to track. Finding how much money a campaign and all their political allies combined have raised is a daunting task, one that’s vulnerable to definitions over what we actually constitute as a political ally.

In 1980, the total dollars raised by each of the candidates’ campaigns would be a much stronger signal of their ability to promote their message in advertising, and to fund a get out the vote operation. In 2024, this is no longer the case.

However, a campaign fundraising retains predictive value as it signals a campaign’s ability to generate grass roots support, and how well run the campaign is. The model specifically uses the money campaigns raise from individuals – that means excluding self-funding and superPACs. It calculates the amount they have raised relative to the other campaign.

Fundraising improves the forecast provided, so long as it’s only a very small share of the overall prediction. If it was any bigger, it would seriously reduce the accuracy of the forecast, because the ratio can sometimes be lopsided and suggest a landslide where none will happen. That’s why its peak influence is 1.2% 100 days out from election day. It shrinks down to 0.5% on election day.

5. Putting it All Together

Now, we can combine all four elements to predict the popular vote. I divide these into two overall sections: polling and the fundamentals, which include the economic, special elections and fundraising. Early in the election cycle, past electoral history shows us that the polling only has so much predictive value, as Americans are ripe to change their mind as they pay more attention to the campaigns. By election day, it becomes much more useful. Consequently, the weight of polling in the forecast starts at 52% 100 days before the election and increases to 80% by election day.

B. Predicting the Vote in Each State

With the popular vote in hand, I can now predict the result in every state, particularly in the battleground states that will decide which candidate becomes the President for the next four years. This is more straightforward than the national polling, and only includes two components: Partisan Lean and State Polling.

1. Partisan Lean

This is where the projected national popular vote becomes so important. I use each state’s recent electoral history to get a strong read on how the state is likely to vote based on recent elections if the national vote was tied. This is referred to as the state’s partisan lean. The forecast adjusts the state’s partisan lean by the projected popular vote to predict how it will vote in the next election. For example, Michigan has a R+1.38% partisan lean – so if Harris wins the popular vote by 2%, that would in theory translate to a Harris +0.62% lead in Michigan.

To calculate the Partisan Lean, the forecast looks at how the state voted relative to the national vote in the last two presidential elections, and the most recent midterm. In midterms, I look at the combined vote in House races for each state relative to the national vote that cycle, after adjusting for districts where parties failed to recruit an opponent.

For this election, I’ve also introduced a new feature called Partisan Drift. I look at how much the state has drifted towards one party over the last 4 presidential elections, and assume it will, on average, continue to drift slightly in that direction in the next election. The degree of drift expected is higher if it has consistently moved in the same direction for four straight cycles.

As far as I’m aware, this addition is unique to RacetotheWH. I tested this on every election since 1972, and found it measurably improved the model’s ability to correctly call the winner in competitive states.

2. State Polling

In races without polling data, the Partisan Lean adjusted by the national vote constitutes the entire state forecast. However, our state polling average carries the most weight in states that have quality polling.

The polling average follows the same methodology described in the national polling section. As the election approaches, the model places greater emphasis on recent polling, as last-minute swings in the polls are common just before election day.

3. Forecast the Projected Margin of Victory in Each State

The forecast for each state combines the Partisan Lean and state polling to predict the Projected Margin of Victory, or the projected lead for the winning candidate.

The weight for the State Polling for each state will depend on the quality and recency of the polling. The closer we are to the election, the greater the weight the state polling can have, maxing out at just under 80% entering election day.

In a state like Hawaii, there may be only one or two polls conducted during the entire election cycle, if any. In that state, Partisan Lean will be the most important part of the Projected Margin of Victory. In contrast, a state like Georgia will have an abundance of high-quality polling, making polling far more influential than the Partisan Lean.

C. Chance to Win in Each State - How Likely is the Forecast to Get it Wrong?

Any election forecaster worth their salt knows that they won’t always get it right. Once my forecast can make a prediction of what WILL happen, I must determine how likely I am to get it wrong. Why does this matter? Because I turn my projections into probability. The chance the forecast gives a candidate to win depends not just on the projected lead, but also on how much uncertainty the forecast has about its prediction.

For example, let’s hypothetically say we are 100 days away from the election, and we’re dealing with Virginia, and I’m expecting Kamala Harris to win the popular vote by 1.9%, and there have been no state polls released yet. Based on Virginia’s partisan lean, we would expect Harris to win by a margin of 8.0%.

However, given the long time until election day and the total absence of polling, there would be a far higher risk of the forecast missing – perhaps by even more than 8%. Therefore, the forecast would give Harris just a 79% chance to win VA, and a 21% chance of an upset.

Now let’s use a different scenario. It’s election day, and we have tons of high-quality polling. The forecast still expects Harris to win Virginia by 8%, except this time we have lots of high-quality polling, and there are very few undecided voters in the polls. Both the fundamentals and polling part of our forecast suggest a similar outcome, lowering the odds that we are misreading the race. While a significant miss is still possible, the odds are considerably lower. That same Harris +8% projection would now translate into Harris having a greater than 95% chance of winning.

In my opinion, projecting the likelihood of a forecast miss cannot be done effectively through educated guesses or by examining only the last few election cycles, as this provides too small a sample size. I examined data back to 1972, the earliest election cycle for which I have all the necessary economic data to run the forecast effectively. I tested the forecast 30,000 times at each crucial point throughout the cycle, starting from 100 days before the election.

This provided extensive data on where the forecast was missing and offered clues about which signals indicate a higher or lower likelihood of a miss. It also gave me precise data on the exact amount I should increase or decrease the uncertainty for each state based on each indicator.

Here are the key factors that suggest a forecast miss is more likely:

  • There is no state polling

  • There is low quality/out of date polling

  • There are a high number of undecided voters in the polls

  • The fundamentals part of the forecast and the polling diverge

  • It’s a long time until election day

  • There have been consistent polling misses in that state in recent cycles

After estimating the potential forecast miss, I input both the race forecast and the expected miss into a normal distribution equation to calculate the likelihood of Kamala Harris and Donald Trump winning the election.

The normal distribution equation takes three inputs, and converts them into a probability:

1. The Mean: This is what I expect the most likely outcome to be. This is the Projected Margin of Victory

2. Standard Deviation: The expected forecast miss (uncertainty)

3. X: A normal distribution equation calculates the probability of a number being higher or lower than X. In this case, X represents a tie in the election, or 0%. In my forecast, if the number is above 0, Harris wins, and if its below, Trump wins.

The resulting number indicates the probability of Harris and Trump winning each state.

D. Simulating the Election

The final step is to translate the projections for each state, the national popular vote, and the uncertainty into an overall prediction for the Presidential election at the national level. Specifically, I aim to estimate how likely Kamala Harris and Donald Trump are to win, and the number of electoral votes that they will most likely win.

I input my projections into an election simulator I built, which runs through the election 50,000 times each day. In each simulation, either Harris or Trump will randomly do better than expected in each individual state. The goal is to run through combinations of feasible scenarios that could change the outcome in unexpected ways. In one simulation, Republicans might perform particularly well in Midwestern states, while underperforming with Latino voters. In another, Democrats may overperform with white college-educated voters and in states that traditionally tilt blue.

How Each Simulation is Run:

1. The Popular Vote

The first step in the simulation is to model the popular vote. I use an inverse normal distribution equation, similar to the one used in the State Forecast. This requires three components:

  1. The Mean: the Projected Popular Vote from the Forecast

  2. The Standard Deviation: This represents the expected variation from the projected national popular vote. It slowly declines as we get closer to the election.

  3. Probability: The Probability represents the total range of outcomes that can happen in the simulation. A score just over 0 (like 0.01%) would represent the best-case scenario for Donald Trump, while a score just under 1 (e.g., 99.99%) would represent the best-case scenario for Kamala Harris. A random number generator determines the probability value, which causes the popular vote to vary in each simulation.

2. Simulating Each State

As in the election forecast, the popular vote is used to determine the outcome in each state. We use another inverse normal distribution equation to project the outcome for each state, D.C., and the five Congressional Districts that assign individual electoral votes. Here are the three components:

  1. The Mean: I use a modified version of the Projected Margin of Victory for each state, where the projected national popular vote is stripped from the projection. I replace it with the popular vote from each simulation, weighted by the impact of the partisan lean in that state’s projection. In other words, the popular vote has a greater impact in states with less polling, where we must rely more on past electoral history than on current voter polls.

  2. Standard Deviation: This is the same value used in the State Forecast, which measures the likelihood of error in predicting each state.

  3. Probability: Once again, Probability represents the total range of outcomes that can happen in the simulation. However, this time I can’t just use one random number for each state. That would lead to a disastrously inaccurate model - because it would treat each state like an island, independent from the results of each other. This approach contributed to some forecasts in 2016 giving Clinton a 99% chance of winning, despite signals indicating there was a real possibility of an upset.
    Instead, the results in each state need to be partially correlated, based on demographic similarities between states. Once again, the probability is driven by random numbers, but instead of using one, we combine three different random numbers.

A. State Probability #: Each State & D.C. has their own random number that is unique to them - except for the Congressional Districts of Maine and Nebraska, which share their number with their parent state.
B. National Probability #: This random # is shared by every state and is the same random number we used for projecting the national vote.
C. Demographic Probability #: America is a big and diverse nation, with great variations across states in race and education. While there is a big national component to these elections, states with certain types of characteristics tend to move together. This figure is a composite of several random numbers based on region, race, education, and the state’s partisan lean. States that are similar (like Michigan and Wisconsin) will have a similar Demographic number. States that are quite different (Hawaii and New Hampshire) will have quite different numbers.

I combine the three pieces into one overall probability number. By this point, the probability number is a combination of show many numbers that together, they tend to be much closer to 50% than they would be if t were one random number. This is a dangerous flaw—without correction, Trump would win states only 3% to 4% of the time that he should win 10% of the time (and the same for Harris).  I correct this by boosting a certain percent of the probability numbers that are over 50%, and contracting a certain percent of probability numbers that are under 50%. I choose the numbers entirely by random numbers as well, so the simulation is autonomous.

With the probability number established for each state, the forecast can now use the normal distribution equation to predict the margin of victory for Harris and Trump in each state.

3. The Results of each Simulation

For each of the 50,000 simulations run daily, the Simulator checks every state to see which voted for Republicans and Democrats and assigns the winning party the state’s electoral votes. It then checks which party won at least 270 electoral votes or if there is a tie at 269.

The simulator runs 2,000 simulations at a time, repeating the process 25 times daily to reach 50,000 simulations. I track how many electoral votes each party wins across those simulations to determine the likelihood of each party winning the election.

I spent a great deal of time testing the simulator on every election since 1972 to ensure it both correctly identified the odds of an upset and came as close as possible to predicting the correct number of electoral votes each candidate would win.

I conducted these tests at various points throughout the election cycle, ranging from 100 days before election day to the day itself. This process helped me identify areas where the model was falling short. I devoted significant time to tweaking the forecast to better predict the likelihood of a candidate winning an upset. This required the model to accurately detect when a candidate might overperform in a specific region or among certain voter groups, potentially creating a path to winning the electoral vote despite being the underdog. The Simulator checks every state to see which voted for Republicans and Democrats, and assigns the winning party the states electoral votes. It then checks to see which party won at least 270+ electoral votes, or a tie if they both have 269.

It runs through 2000 simulations at a time. I run through the process 25 times a day to get 50,000 simulations of the election. I track how many electoral votes each party wins across those 50,000 to determine the likelihood of each party winning the election.

I spent a great deal of time testing the simulator on every election since 1972 to make sure it came as close as possible to predicting both the winner, and the number of electoral votes they’ll win. I did this throughout the cycle, from 100 days out to election day. Tihs helped me identify where the model was falling short. I spent a significant amount of time tweaking the forecast was able to accurately predict the likelihood of a candidate winning an upset. That meant it had to correctly read when a candidate could overperform in a certain region or with a certain part of the electorate, that may give them a pathway to winning the electoral vote even when they were the underdog.

3. Concluding Thoughts

Credit: First, I want to thank my friends at Split Ticket, particularly Lakshya Jain and Adam Carlson, who I constantly checked in with while designing the forecast to test my unconventional ideas to see if they thought they may have predictive value. Split Ticket is an up-and-coming election forecasting site, like RacetotheWH, with a strong track record in their debut last cycle. I strongly recommend you follow them.

Second, I’d like to thank the team at FiveThirtyEight, who compiled a polling average dating back to the 1960s and made it publicly available. This saved me hundreds of hours tracking down polls from past decades. It only took a few days of trying to scan local Michigan newspapers from the 1970s to track down polls to appreciate how unpleasant work FiveThirtyEight saved me from.

Future Work on the Model: I’m still working on a few minor tweaks to the model that I think could slightly improve the forecast. I am still refining my understanding of how much uncertainty to account for in the national popular vote projection on each day leading up to the election. If I make adjustments to this or any other aspect of the model, I will post the changes here.

I will be adding more interactive graphics to the site in the coming days and weeks, providing greater detail on the factors driving the election in each state and in the national popular vote. I will also add a feature that allows users to run a simulation of election night, which will update each morning with a new simulation.

Thanks for following my work at RacetotheWH, and feel free to contact me at RacetotheWH@gmail.com