Red wine is good for you. Or bad for you. Or good for you. And coffee drinkers have a better shot at surviving cancer. And men prefer women wearing red. And millennials are selfish narcissists. And there's a "liberal gene." And semen is an antidepressant.
These stories make for good headlines. But science isn't a lightning bolt. It's an incremental process: Slivers of evidence build on each other, over long periods of time, to (hopefully!) get at the truth.
Unfortunately, that means that scientific studies are often wrong — or, at least, aren't strong enough to show what they claim to. This sobering fact is underscored in a large study published Thursday in the journal Science showing that more than half of psychology studies can't be replicated.
The study was carried out by 270 researchers from 17 countries. They tried to reproduce the findings of 100 different psychology studies — everything from whether having what we want makes us happy to whether white people look at a black person in the room when they hear racist remarks.
This so-called Reproducibility Project: Psychology is the largest-ever effort to systematically redo social science experiments. And its results are somewhat disheartening: Researchers were able to replicate the findings of just 39 out of 100 studies.
"As a social psychologist I feel like, ugh, boy, I wish we were doing better than this," Brian Nosek, a psychologist at the University of Virginia and leader of the new study, told BuzzFeed News. "We hope that any individual study provides the answer, but they almost never do. Any study is just a single piece of evidence."
Nosek, concerned by science’s tendency to prioritize novel studies over digging deep to understand a phenomenon, launched the Reproducibility Project in 2011.
"Since graduate school, I've been interested in methodology," Nosek said. "When [my lab] saw stuff we were excited about in the literature, we would try to replicate it, and a lot of the time it would end there, because we couldn't get the original result."
The project began with Nosek sending an informal email to colleagues asking if they wanted to help him try and reproduce psychology studies. The response was way bigger than he expected — all told, 270 researchers wanted in.
Each replication team chose a study published in 2008 in one of three top psychology journals: Psychological Science, the Journal of Personality and Social Psychology, and the Journal of Experimental Psychology: Learning, Memory, and Cognition. Their task was to try to replicate the study's main finding using methods as close to the original as possible.
In general, the study found that papers on social psychology — which focuses on how people interact — were less likely to replicate than papers about learning, memory, language, and other cognitive skills.
Nosek's team also found other characteristics that seem to make papers more likely to replicate. For example, the most reproducible papers tend to have a low "p-value," a common metric in statistics that estimates how likely you'd be to get a given result just by chance.
Generally, a p-value under .05 is considered "statistically significant," meaning the results are less than 5% likely to be random. The Science paper showed that, as you would expect, lower p-values tend to be good markers of reproducibility: The closer a p-value was to 0, the more likely the researchers were to get the same results in a do-over.
To continue these efforts, in 2013 Nosek co-founded the Center for Open Science, an organization dedicated to open communication between scientists. The Center runs an open-source repository for data called the Open Science Framework, where all of the data for the Reproducibility Project: Psychology was published on Thursday afternoon.
The reproducibility problem isn't limited to psychology research. In fact, the Center for Open Science is also carrying out other reproducibility efforts, including one on cancer biology.
If a study doesn’t replicate, it doesn’t necessarily mean that the scientists were sloppy, or that its hypothesis is wrong.
"I dont want people to walk away with the idea that if a given effect didn't replicate, that means that the study or the finding is false," Johanna Cohoon, a project coordinator on the Reproducibility Project, told BuzzFeed News.
Instead, the effort is a first step to figuring out what biases and variables might be affecting the original results. Nothing can ever be an exact replica of an experiment, and small changes might have big effects. "It's intended to be a conversation starter, and to incite action. It is not definitive, and it's not the end of a story," Cohoon said.
Throughout the history of psychology, there have been many solid hypotheses that failed to replicate initially, Jim Coan, a psychology professor at University of Virginia who was not involved in the Reproducibility Project, told BuzzFeed News.
That includes the famous theory of cognitive dissonance — the idea that one person can hold two conflicting views at the same time — as well as certain links between patterns of brain activity and personality traits.
"If we had stopped with any one of those failures, we would have been wrong," Coan said.
Conversely, even if a study replicates perfectly, it doesn't necessarily mean that its underlying theory is correct. "Even if a theory is wrong it may be possible to reproduce the results exactly," Gary King, the director of the Harvard Institute for Quantitative Social Science, told BuzzFeed News.
Just as important as direct replication, King added, is so-called theoretical replication, or testing out the conclusion of the first study by doing a different study with the same goal. "You want to see studies replicate, but you also want to see other tests of the same hypothesis," he said.
But observers outside of science are less optimistic.
Chances are, a lot of papers failed to replicate because they were wrong, John Bohannon, a journalist at Science, told BuzzFeed News. Bohannon has made a name for himself by poking fun at bad science, first by getting hundreds of journals to accept a fake paper, and then by hoaxing news organizations into writing about an intentionally bad study that "proved" chocolate helps you lose weight.
"I decide not to cover stuff all the time, even if it's a sexy result and I can totally see how it would make a nice little story," Bohannon told BuzzFeed News. But journalists aren't scientists, so ultimately "the onus is on the journal editors to really clean up their act, to reduce the number of low-powered studies."
There are plenty of examples of famous studies that were totally wrong, like the faked study showing that talking to gay people makes you more likely to support gay marriage.
And some of the studies that didn't replicate in the Reproducibility Project got news coverage when they were published, including one that suggested people who have less faith in free will are more likely to cheat.
So just because scientists can't repeat an experiment once doesn't mean the findings are wrong. But the next time you read about a hot new study, take it with a grain of salt. Or a lot of salt. Or as little salt as you can manage.