Another spate of high-profile and provocative psychology studies have failed to replicate, dealing blows to the theories that fiction makes readers empathetic, for example, or that the internet makes us dumber.
At a time when psychology researchers are increasingly concerned about the rigor of their field, five laboratories set out to repeat 21 influential studies. Experiments in just 13 of those papers — or 62% — held up, according to an analysis published Monday.
The eight papers that did not fully replicate — seven in the journal Science, one in Nature — have been cited hundreds of times in scientific literature and many were widely covered by the media.
Failing to replicate isn’t definitive proof that a finding is false, particularly in cases where other studies support the same general idea. And some scientists told BuzzFeed News they do not agree with how the replications were done.
Still, the new findings are part of an overwhelming, and troubling, trend. The so-called reproducibility crisis has hit research in many fields of science, from artificial intelligence to cancer. Shoddy psychology research has received the most attention, with a 2015 report replicating just 36% of 97 studies.
It makes sense that scientists want to publish data that is surprising or counterintuitive. “That’s not a bad thing in science, because that’s how science breaks boundaries,” said Brian Nosek, a University of Virginia psychologist and executive director of the Center for Open Science, which led the replication project.
But too few scientists, he said, recognize the inherent uncertainty of their splashy results. “It’s okay if some of those turn out to be wrong,” he said.
In science, however, the deck is stacked against humility. Scientists are judged by how much they publish, and most journals won’t publish papers that find negative results. And the most prestigious titles — including Science and Nature — screen for the most novel, most surprising findings that will stand out to scientists across disciplines.
“If you’re saying something has to be surprising and wow people, you’re almost by definition saying, ‘We’re picking things that are further away from what we already think is true and therefore less likely to be true,’” Sanjay Srivastava, a University of Oregon psychologist not involved with the replication project, told BuzzFeed News. “That’s not how a lot of everyday, incremental, good scientific work, for the most part, happens.”
The new analysis zeroed in on psychology papers published in Nature and Science between 2010 and 2015. Many papers described the results of multiple experiments, but due to budget constraints, Nosek and his team chose to replicate just the first experiment described in each. By recruiting participants through Amazon Mechanical Turk and college campuses, the researchers did the experiments on groups that were about five times larger than the original sample sizes.
Some scientists whose studies didn’t replicate are taking the bad news in stride.
For example, back in 2012, Will Gervais reported in Science that people were less likely to believe in God when they looked at pictures of “The Thinker” statue versus other statues.
“Our study in hindsight was outright silly,” said Gervais, a University of Kentucky psychologist.
In a 2013 study, David Kidd and a colleague reported in Science that readers were better at guessing people’s emotions after reading fiction versus nonfiction. The New York Times summed up the takeaway as: “For Better Social Skills, Scientists Recommend a Little Chekhov.”
One of the key experiments in that study was not replicated in the new project. Kidd, a postdoctoral fellow at the Harvard Graduate School of Education, told BuzzFeed News that the failed replication has taught him a lot.
“That’s not accurate — that’s not how statistics works.”
“A few years ago I would have thought that that study was bulletproof because I had a small sample, but a significant effect,” he said, referring to the experiment’s 86 people. “That’s not accurate — that’s not how statistics works.”
At the same time, Kidd noted that one of the paper’s other experiments has been replicated by another group. Other scientists, too, said that a single nonreplicated experiment doesn’t necessarily indict an entire theory.
Another Science paper, published in 2012 and covered by the New York Times and Slate, sought to explain a phenomenon of poverty: how scarcity, whether of money or time, changes how people focus their attention, which in turn can lead to behaviors like borrowing too much money.
In one of the experiments, “poor” participants given a small number of letters during an electronic "Wheel of Fortune" game performed worse on attention tests afterward, compared to the “rich” participants who were given more letters. The study authors suggested that the poor players’ minds were more fatigued, a condition that might lead to riskier, less thought-out decisions. But this finding failed to repeat when the Center for Open Science team did it.
One of the study’s authors, Anuj Shah of the University of Chicago, also failed to replicate the “Wheel of Fortune” experiment. But his lab was able to confirm the other four experiments in the paper, which were not attempted by Nosek’s team. Shah said he was “disappointed” that the first experiment didn’t replicate, but “that’s how science moves forward.”
His collaborators — sociologists Sendhil Mullainathan of Harvard University and Eldar Shafir of Princeton University — wrote a 2013 book, Scarcity: Why Having Too Little Means So Much, which was partly based on the research. But the pair told BuzzFeed News that the “Wheel of Fortune” experiment is not mentioned in the book.
Perhaps the most well-known paper tackled by the replication researchers was about how online access to information shapes people’s minds. The 2011 paper made headlines in Business Insider (“Google Is Destroying Our Memories”), the New York Times, Time, Wired, and the BBC, among others.
One of its experiments presented subjects with difficult trivia questions, followed by various words related to computers and search engines, as well as noncomputer terms, in different colors. Participants were slower to identify the colors of the computer-related terms, because, the authors theorized, they were distracted by the words themselves — perhaps because they were already thinking about looking up the questions online. “It seems that when we are faced with a gap in our knowledge,” they wrote, “we are primed to turn to the computer to rectify the situation.”
The original scientists did not participate in the replication because the Center for Open Science had trouble getting in touch with them: one of them has died and the two others have left academia. They include Betsy Sparrow, a former psychologist at Columbia University, who disagrees with the replication methods. The computer-related terms should have been updated to refer more to smartphones, she argues.
Still, based on the other experiments in the paper and other studies, she believes in the underlying idea that people don’t remember information as well when they know they can look it up online. “People definitely think about the internet when they want to know something,” Sparrow told BuzzFeed News.
Nosek, the replication project leader, said that scientists used to be defensive about his efforts. But now, more journals and researchers see the importance of making research more rigorous.
Since 2013, Nature has required scientists to complete a checklist with information about their experiments’ methods and conclusions. Scientists in some fields, including behavioral sciences, also have to provide certain experimental details that are published alongside the studies. (The Center for Open Science’s replication results are being published in a Nature journal called Nature Human Behavior.)
Science put out guidelines in 2015 for research quality and transparency, and requires scientists to make available all their data and materials. In addition, the journal now has a board of reviewers who find statistics experts to evaluate the statistical elements of papers.
In the last four years, 125 journals, mostly in the behavioral sciences, have adopted “registered reports,” in which journals approve a study’s methods before the results come in. The idea is to minimize the chances of retroactively changing the methods to make the data fit a tidy conclusion. Similarly, more than 20,000 studies have been preregistered on the Center for Open Science’s website.
“What’s positive in the change in culture over time is we’re not focused on skepticism about the process — whether we should replicate or not — but rather about the phenomena we’re investigating,” Nosek said. “That’s where scientific debates are rich and productive.”
Eldar Shafir's name was misspelled in an earlier version of this post.