The NYC Mayor Forecast
RacetotheWH Forecast & Interactives
Healthy Dose of Skepticism:
I've worked hard on this forecast, and I think it is well-positioned to succeed. Nonetheless, I would strongly caution you to have a healthy dose of skepticism when reading my projections. Election forecast thrives or falls based on the assumptions about American politics that underpin them, and this is the first-ever ranked-choice election in NYC Mayoral history. Time will tell if those assumptions end up being correct.
If you followed RacetotheWH last year, then you should also know that this forecast doesn't contain the same level of scientific rigor. That year, my forecast was one of the most accurate in the nation, and I was successful because I followed the scientific method. Every time I had a hypothesis about a new feature that could help me predict the election result, I tested it rigorously. I used data from all fifty states in every presidential election from 2000 to 2016 to ensure every new feature had predictive value. In the Senate forecast, I used data from every election since 1970. I did everything I could to fine-tune and perfect my election forecast, and that’s why it was one of the most accurate in the nation.
This was impossible to replicate in NYC because this election is the first of its kind. Moreover, primaries tend to produce more wacky results than general elections. Therefore, I'd consider this to be a useful projection of what the race is likely to look like, but don't take it as the gospel truth.
Baseline Projection:
Ranked-choice voting provides a unique challenge when it comes to forecasting an election. However, to start, we have to predict the share each candidate will get in the first round or each candidates’ Baseline Projection. Polling is the most important factor here - and I get it straight from my NYC Mayor Polling Average. I give more weight to the polls that are recent, that have large sample sizes, and that are from pollsters with strong track records. I also correct for bias whenever a poll is released by one of the candidates' campaigns.
Next, I adjust their polling slightly for momentum. This is important to do in a primary, because there are often huge swings in polling the last few days of an election. However, polling tends to be a snapshot in time of what voters felt last week. A surging candidate will often slightly overperform their election day polling. This was true for Bill de Blasio in 2013. To correct for this, I measure changes in each candidates' polling over the last 14 days. Then, I add 15% of that change onto their current polling. This affects candidates that are losing support as well.
Polling makes up 92% of the total projection - the other 8% comes from fundraising. Candidates with better fundraising are better able to get their message out, which provides a crucial strategic advantage during a campaign. Additionally, fundraising is a great sign of which candidates have the most grassroots support and will be better able to mobilize their voters on election day.
50% of the fundraising score comes from the number of donors each candidate has from NYC, and 25% each comes from total dollars raised, and dollars left to spend in their campaign's "war chest". I measure the percent they have relative to all candidates in the race. (Ex: Yang has raised 13.7% of the total money raised, 11.9% of all the money left unspent, and 20.1% of all donors from NYC).
The Simulation - the First Round:
Next, we need to simulate voter’s first choice in the election. I use something called a normal distribution, which helps me determine how likely a candidate is to get a given vote share based on their polling. To do a normal distribution, I also need to assess to calculate standard deviation. This measures essentially how much uncertainty we have about the accuracy of the projection. The less confident we are that the result will be close to the projection, the higher the standard deviation. Primaries by their very nature are far less predictable than general elections, so the standard deviation is quite high. It's individualized for each candidate. It's higher for high polling candidates, and for candidates that have had big surges or decline in support. For all candidates, the standard deviation drops as we get closer to election day.
The last piece I need is a “probability number” for every single candidate, which I use in the normal distribution. These are random numbers generate from my computer - and they are different in all 20,000 simulations of the election. Essentially, a number near 0 would be the absolute worst-case scenario for a candidate - and a 1 is the best-case scenario. Now, I can run the normal distribution for each candidate to get their share of the vote. Finally, I adjust their vote shares to make sure the vote combined adds up to 100%.
Ranked Voting
Now, we get into the hard part - Ranked Choice Voting. Every voter has a chance to list their top five candidates. Once the initial vote is tallied, the candidate receiving the lowest number of votes gets eliminated, and their votes are redistributed to the candidates that those voters ranked second. This process keeps repeating itself until one candidate has over 50% of the vote. If this is confusing, feel free to look at my Ranked Choice Voting Simulation for the NYC Mayor Election, which shows the process in action.
Candidates' success in the election, and the simulation, isn’t merely a product of how well they do in winning second choice support - but specifically, it matters how they do among the candidates that are eliminated before them. For example, Eric Adams gets a huge boost when Ray McGuire is eliminated - polls show him winning 54% of their voter's support as the second choice. Kathryn Garcia benefits when Shaun Donovan is eliminated, as she gets 33% of his support.
Using data from pollsters, I've designed an average for every candidate that shows their voter's second choice preferences. I use this as a baseline in my forecast. In each simulation, voters' second choice is a bit different, and I once again use a normal distribution and random numbers to simulate this process.
I run ten different rounds of Ranked Choice Voting. Each time, my simulation identifies the lowest ranking candidate and redistributes their support to the remaining candidates. This process is repeated until there are two candidates are left, and one will have over 50% support. Curious to see how it works? You can run through five simulations of ranked-choice voting, with new ones being uploaded after every update in the forecast. Every once in a while, a zany one might get included that has something highly unusual, like Eric Adams getting knocked out in the third round. Remember, each individual simulation isn't a projection. There's a reason I run the simulation 20,000 times. Zany things like that technically could happen, but they are highly unlikely.
You might have noticed a flaw here - I’m not doing anything with voters' third, fourth, and fifth choice preference. Ideally, I’d be tracking each individual vote through the entire process. I have a few ideas on how I could design this and make it work - but unfortunately, it would be far too intensive for my computer to run it