May 7, 2015, was a banner day for Britain’s Conservative Party. After five years of uneasy coalition government, the Conservatives easily gained an overall majority in the UK parliament — defying the pundits’ expectations.
For the pollsters, it was an unmitigated disaster. They had predicted a dead heat, with polling averages suggesting that the Conservatives and their rival party, Labour, would each win 34% of the popular vote. Not one firm had put the Conservatives more than a single percentage point ahead. Yet when the votes were tallied, the Conservatives won 38% to Labour’s 31%.
Nine months later, a panel of political scientists and survey experts delivered a damning post-mortem on the pollsters’ performance. They considered several possible explanations: Were “shy” Conservatives lying about their voting intentions? Was Labour undermined by “lazy” supporters who couldn’t be bothered to vote? No, the report concluded, there was a more fundamental problem with the pollsters’ methods: They had polled too many Labour supporters. In other words, their polling samples simply hadn’t been representative of the real electorate.
It was an astonishing conclusion. The cardinal rule of survey research is that the sample has to be representative of the population you are interested in. How could the pollsters have screwed up so monumentally?
Quite easily, if you consider the perfect storm that has buffeted the polling industry in recent years. As the way in which we use our phones and the internet has shifted, it has become harder and more expensive to recruit representative samples — just as the media companies that commission most election polls have been hit by declining revenues.
“There’s nothing in the US that makes us immune."
US pollsters face exactly the same pressures as their British counterparts. And while they’ve so far avoided major embarrassment, the next few years will provide a serious test of their ability to keep calling elections correctly.
“There’s nothing in the US that makes us immune,” Courtney Kennedy, director of survey research with the Pew Research Center in Washington DC, told BuzzFeed News.
Random sampling is how it used to be done.
The gold standard for polling is called Random Digit Dialing, or RDD. Randomly selecting telephone numbers (whether landline or mobile) should get pollsters close to a representative sample of the population.
RDD is expensive, however — and getting even more so. Back in the 1970s and 1980s, pollsters could count on response rates of 70% or more in a telephone survey. Today, people screen their calls and are less inclined to be interviewed by a pollster even if they pick up. So response rates have dropped to less than 10%.
At the same time, many of the traditional media companies that commission election polls have less money to spend on them, as they’ve struggled to migrate their advertising revenues online.
Fortunately for a poll-hungry media, the internet gives as well as taking away. A new breed of polling companies — including YouGov, SurveyMonkey, and Morning Consult — has emerged that can access millions of people who’ve volunteered to take samples online. Online polls can now be completed at a fraction of the cost of RDD phone surveys.
(Disclosure: BuzzFeed News has this year worked with the polling firm Ipsos on online surveys of LGBT attitudes to the presidential candidates’ policies, and Americans’ responses to terrorism; Morning Consult is also currently asking questions on our behalf.)
But online, there’s no way to sample randomly.
The big difference is that everyone who participates in these panels has opted in, often by clicking an ad. “There is no truly ‘right’ way to recruit on the internet,” Kennedy said.
To get around this problem, online pollsters collect demographic information from the volunteers, such as their age, gender, and race. So for each individual poll, they can then draw from this pool, picking participants based on demographics so that their samples match as closely as possible to the population breakdown revealed in US Census Bureau data.
But if your selection criteria are flawed, things can go badly awry. YouGov operates on both sides of the Atlantic, and its chief scientist admitted that it slipped up with the 2015 UK general election. “We put in too many younger people who said they were going to vote Labour,” Douglas Rivers, also a political scientist at Stanford University, told BuzzFeed News.
In fact, British pollsters hadn’t been using randomized samples for more than two decades. (In part, this is because RDD is harder to perform in a country that lacks a standardized format for telephone numbers.) And it turns out 2015 wasn’t the first time UK pollsters had been seriously wide of the mark: In 1997, when Tony Blair came to power, they overestimated Labour’s lead over the Conservatives by more than 6 percentage points.
“The absolute error was the same,” Patrick Sturgis of the National Centre for Research Methods at the University of Southampton, who led the post-mortem into the 2015 failure, told BuzzFeed News. But because the pollsters had called the winner correctly, the 1997 error didn’t make big waves.
(After the Brexit vote this past June, UK pollsters again got a bad rap for suggesting that Remain would narrowly prevail. In reality, however, the split of the vote was within the margin of sampling error, so statistically they weren’t too far off.)
Given the British experience, you would think that everyone would choose RDD polling over online samples. But the cost of telephone polling remains a major obstacle. And even if it wasn’t a factor, online pollsters argue that the traditional gold standard is itself badly tarnished.
Today’s “random” samples are not as random as they once were.
Even though telephone polls start off with a random set of numbers to dial, low response rates mean that the samples pollsters end up with may not be representative — typically younger and less-educated voters are thin on the ground.
Michael Ramlet, Morning Consult’s co-founder and CEO, told BuzzFeed News that the idea that any pollster is today working with a true random sample is “essentially a myth.”
To correct any departures from known demographics, pollsters have long “weighted” their results. If a sample contains only half as many 18 to 25 year olds as it should, for example, and twice as many people over 65, the young voters would be given a weight of two, and the older ones a weight of 0.5. So every young person who said they intended to vote for Trump or Clinton would count twice, while two older supporters would be needed to register a single preference for each candidate.
If weighting factors become extreme, a small number of responses from an underrepresented group can skew the overall poll result. And as telephone poll response rates have dwindled, the weighting factors used by pollsters have grown.
Because online pollsters pick their participants to try and match known demographics before they start a survey, they mostly use smaller post-survey weighting factors than telephone pollsters do. YouGov typically doesn’t need to apply weights greater than 1.3 for a US national sample, Rivers told BuzzFeed News.
So one way or another, pollsters are all having to make adjustments to compensate for the difficulty in obtaining representative samples. And when trying to call elections, they also have to figure out which respondents are actually likely to vote — which only six out of ten eligible voters did in the 2012 presidential election.
There’s no agreed method for identifying likely voters, and when combined with differences in the way in which different pollsters apply weighting factors, how it’s done can make a big difference to reported poll results. Earlier this week The Upshot reported that it had given the same raw polling data to four different pollsters, getting a spread of results from a 4 percentage point lead for Clinton to a 1 point lead for Trump.
Pollsters say they’re independent, but sometimes they “herd” together.
The wiggle room offered by weighting and likely voter modeling may also explain another disturbing conclusion from the 2015 UK election polling post-mortem: a phenomenon called “herding,” in which the pollsters seemed to converge on the same, wrong answer.
“What was surprising was that they were all wrong in the same way,” Sturgis said.
This doesn’t mean that pollsters were colluding, but instead probably indicates that they were each making decisions about sample adjustment, weighting, and likely-voter modelling with one eye on the results coming out from other firms — rejecting choices that would have moved their results away from the crowd.
Despite the 2015 British debacle, most of the US polling experts consulted by BuzzFeed News expect a similar failure for the 2016 presidential election — given that it is subjected to more intensive polling using a variety of different methods.
“The presidential election is being studied to death,” said Timothy Johnson, who heads the Survey Research Laboratory at the University of Illinois at Chicago. “I think the polls are doing a very good job collectively.”
So far, at least, the accuracy of presidential polls has held up well.
Average candidate error is the difference in percentage points between the polls’ estimates and actual support for each candidate in the election.
But by the next election cycle, inexorable economic and logistic pressures may mean that the polling landscape in the US will look more like that in Britain, with opt-in online polls dominating. “By the 2020 election, telephone surveys could be a residual category,” said Johnson.
In an imperfect world, which polling method is best?
Head-to-head comparisons between RDD phone polls and those run using opt-in online samples have so far produced mixed results. In 2011, researchers led by Jon Krosnick and David Yeager of Stanford University compared the results for online and RDD phone polls run between 2004 and 2009, concluding that the online polls were less accurate.
“If real scientific surveys end up stopping because there are no more call centers and the infrastructure all switches over to the internet, at that point, surveys, as far as I’m concerned, will die,” Krosnick told BuzzFeed News.
But based on more recent studies, other polling experts take a less apocalyptic view. In 2014, political scientists Stephen Ansolabehere of Harvard University and Brian Schaffner of the University of Massachusetts found little difference in the results obtained by an RDD phone poll run by YouGov and one run online, using a opt-in sample adjusted to reflect the makeup of the wider population.
And in May this year, the Pew Research Center compared the performance of one of its own random samples with nine separate online opt-in panels. Testing the samples in an online survey, Pew found wide variation in their performance, with its own random sample falling in the middle of the pack. YouGov, which employed elaborate sample adjustments and weighting, came out on top.
So maybe the online pollsters are right, when they say their adjustments can compensate for the lack of random sampling.
“It is coming along, and may well get there,” Charles Franklin, a political scientist and traditional phone pollster at Marquette University in Milwaukee, Wisconsin, told BuzzFeed News. “We’ll know more in November.”
Several parts of this story were changed to more accurately reflect the nuances of polling methodology:
Randomly selecting telephone numbers should get pollsters close to a representative sample of the population, but it does not ensure that everyone who has a phone has an equal probability of being sampled.
Online polls can now be completed at a fraction of the cost of RDD phone surveys. But RDD phone surveys generally take days, not weeks, to complete.
An example about the weights of the telephone polls done by Langer Research Associates for ABC News has been removed because these weights have been used for several years.
Most of the polling experts consulted by BuzzFeed News expect the 2016 presidential election polls, overall, to be fairly accurate.
Telephone poll samples with low response rates may not be representative. Exactly how big a problem this is in practice is a matter of debate.