Here’s How We’re Going To Grade The 2016 Election Forecasts

We’re laying out our methodology ahead of time, so you can get a sneak peek — and hold us accountable.

BuzzFeed News Data Editor

Posted on November 8, 2016, 5:41 pm

Update, Nov. 28: Results published here.

America’s political prognosticators have spent four years waiting for Tuesday. Did they predict the election correctly? Did they outperform their peers’ forecasts? And will “drunk Nate Silver” ride the subway, telling strangers the day they will die?

As the votes roll in, BuzzFeed News will help answer the first two questions. In this post, I’ll spell out our methodology — partly because you might be interested, and partly to avoid accusations that we’ve “moved the goalposts” to influence the ratings.

Our grades will focus on eleven separate models from nine forecasters:

For each forecast model, we’ll examine the following predictions:

Every statewide presidential prediction, plus DC’s. That means we won’t be grading predictions for Maine's and Nebraska’s district-specific electoral votes; the forecasters handle these races in slightly different ways, and some not at all.

Every Senate prediction, except for California and Louisiana. This year, California’s Senate race pits two Democrats against one another, while Louisiana’s contest is technically a primary. (If one candidate in Louisiana gets the majority of votes, however, they automatically win the general election, which is otherwise scheduled for Dec. 10.)

We’ll primarily grade each set of predictions using something called the Brier score. It’s a widely accepted calculation for quantifying the accuracy of probabilistic predictions. (In fact, FiveThirtyEight itself has used it to evaluate its own March Madness and UK general election predictions.) There are a few nice things about the Brier score:

For each forecast, it produces an easy-to-understand number, ranging from 0 to 1. A score of 0 is perfection: You were 100% confident on every prediction and got them all right. A score of 1 is total error: You were 100% confident on every prediction and got them all wrong.

It rewards confidence on correct predictions, and penalizes confidence on incorrect predictions. If two forecasters predict every state correctly, for example, the more confident forecaster will score better.

It disproportionately penalizes overconfidence. For example, if you gave Trump a 70% chance in Georgia but he loses there, you’d receive a 0.49-point penalty. But if you gave him an 80% chance, you’d receive a 0.64-point penalty.

For each forecast we’ll produce up to three Brier scores:

An unweighted Brier score for the presidential election, in which the predictions for each state count equally.

An electoral vote–weighed Brier score for the presidential election, in which each prediction is weighted by the state’s number of electoral votes. (Because we’re ignoring Maine and Nebraska’s district-specific electoral votes, those states will each receive a weight of 2 — the number of electoral votes assigned on a statewide basis.)

An unweighted Brier score for Senate predictions.

We’ll also provide some other helpful ways of understanding the forecasts’ accuracy:

For forecasts that publish estimates of the Clinton-Trump percentage vote difference, we’ll also calculate a score — the root-mean-square error — that quantifies how closely those forecasts predicted the final spread.

To complement the final Brier scores, we’ll also chart the Brier scores historically — for forecasts that have made their prediction history available — to provide a sense of each model’s volatility.

For both the Senate and presidential races, we’ll also list each forecast’s raw number of correct and incorrect predictions.

We’ll base these scores on the called races at the time of calculation. If we publish scores calculated before all races are called, we’ll also publish each forecast’s best/worst possible final scores.

Fine print:

Any forecast that does not provide probabilities for third-party candidates — e.g., for Evan McMullin in Utah or Gary Johnson in any state — will be considered to be giving that candidate a 0% chance of winning.

For many “safe” Senate races, PredictWise’s “market” forecast is blank. For those races, the leading candidate will be considered to have been given a 100% chance of winning.

In cases where, due to rounding, a projection’s total probabilities for all candidates in a race is not equal 100.00%, those odds will be proportionally adjusted to equal 100% (e.g., 80% and 21% become 80%/101% = 79.2% and 21%/101% = 20.8%).

When calculating the number of correct predictions for forecasts that provide tied odds for the favorite (e.g., 50%/50% or 40%/40%/20%), we’ll give half-credit if either of those candidates wins.

We’ll be using each forecast’s predictions as of 11:59 p.m. ET on Monday, Nov. 7.

Questions?

Update at 4:25 p.m.: Added PollSavvy to list of forecasts.

Jeremy Singer-Vine
BuzzFeed News Data Editor