Over the last week, two overlapping teams of scientists in California released the first results of big antibody surveys to determine how many people have already been infected with the coronavirus. Their estimates were jaw-dropping.
In Silicon Valley, the true number of coronavirus infections could be 50 to 85 times higher than the number of reported ones. And in Los Angeles County, there might be 28 to 55 times more people infected than the official count.
The numbers, covered in the national press and shared widely on social media, suggested that far more people than previously realized have “hidden” infections. If that many people have already gotten sick, it also changes the calculation about how frequently the virus can lead to death. In the US, death rates of confirmed cases are over 5%, a high number driven in part by a lack of diagnostic testing.
But the new numbers out of Northern California suggest the virus may kill a much smaller portion of the wider pool of diagnosed and undiagnosed cases, in this case around 0.12% to 0.2%. That would be closer to the death rate for the flu, which is about 0.1%.
Most experts agree there are far more coronavirus infections in the world than are being counted. But almost as instantly as the California numbers were released, critics called out what they saw as significant problems with, or at least big questions about, how the scientists had arrived at them. Chief among their concerns was the accuracy of the test underpinning both studies, and whether the scientists had fully accounted for the number of false positives it might generate.
“I think the authors of the above-linked paper owe us all an apology,” Andrew Gelman, a statistics and political science professor at Columbia University, wrote on his blog last weekend in reference to the study out of Santa Clara County, home to tech giants like Apple and Google. He added, “I think they need to apologize because these were avoidable screw-ups. They’re the kind of screw-ups that happen if you want to leap out with an exciting finding and you don’t look too carefully at what you might have done wrong.”
The two antibody surveys, led by researchers at Stanford University and the University of Southern California, are the largest conducted in the US to date. Scientists worldwide are counting on widespread use of these blood-based tests, also known as serological tests, to eventually answer important questions about the pandemic, from who might be immune to reinfection to exactly how widespread the disease is. Such studies are underway around the world, from Germany to Italy to New York.
“These are extremely valuable studies, and when they’re done right, they’re going to tell us really important things,” Marm Kilpatrick, an infectious disease researcher at the University of California at Santa Cruz, told BuzzFeed News. “I just think if they’re not done in careful ways, they can mislead us about what’s actually happening.”
Kilpatrick worries that the results of these two studies could in turn erode public trust in the need for lockdowns. “If that’s based on faulty information, that would be terrible,” he said.
Here are some of the biggest criticisms about the studies.
Criticism #1: The scientists sought media attention before having supporting data.
The pandemic has kicked academic publishing into warp speed, and scientists are uploading discoveries to the internet every day, bypassing the normal checks of peer review in favor of quickly sharing information. Even so, both research teams — who share a member, Neeraj Sood of USC — have moved at a pace that’s raised some eyebrows in the scientific community. They floated the possibility of scores of uncounted infections to the media before presenting data to back it up, leading some observers to question whether they had rushed to prove a preconceived theory.
On March 17 in Stat, before the antibody surveys had started, Stanford professor John Ioannidis bemoaned the lack of reliable data about the virus, a “fiasco” that “creates tremendous uncertainty about the risk of dying from Covid-19.” The next week, in a Wall Street Journal op-ed titled “Is the Coronavirus as Deadly as They Say?,” two other Stanford faculty argued that “projections of the death toll could plausibly be orders of magnitude too high.”
Last Friday, a team led by those three researchers uploaded a preliminary draft, or a preprint, about their Santa Clara County study. By early April, there were 956 confirmed cases there. But based on their serological study of 3,300 people, the researchers concluded that the actual number of infections was between 48,000 and 81,000.
On Medrxiv, the preprint server where their results were posted, readers have left 300 comments and counting.
Asked to comment for this story, Jay Bhattacharya, a Stanford professor of medicine and the paper’s senior author, acknowledged by email Monday night that his team had “received a vast number of comments and suggestions on our working paper.” They are planning to soon release a revised version “incorporating many of the suggestions,” with a new appendix “addressing many of the most important criticisms we have heard,” he wrote.
“This is exactly the way peer review should work in science,” he added.
And on Monday afternoon, the Los Angeles results were shared in a press conference staged by health officials from Los Angeles County and Sood, vice dean for research at USC’s Price School of Public Policy and co-leader of the study there. In early April, the county had reported nearly 8,000 cases. But according to the new serological study of 863 people, the researchers estimated the true number of infections was between 221,000 and 442,000.
Those figures, according to an accompanying press release, suggest that the fatality rate is “much lower” than thought.
In an unexpected twist later that night, Sood said he then learned that a draft of his paper, which had not been released as part of the press conference, had mysteriously been posted to RedState.com, a right-wing blog. The site took it down upon his request — though not before a few scientists found it.
In an interview Tuesday afternoon, Sood said he had no idea how the report wound up online without his permission. “It’s just upsetting to me that it was done, because I really tried to make sure that something like this doesn’t happen,” he told BuzzFeed News. (BuzzFeed News has a cached copy but is not discussing it here.)
Sood said he had no choice but to release the numbers under county rules, because anything the public health department is involved with must be disclosed to the county’s leaders. “But we clearly couched those results as ‘these are preliminary findings,'” he said.
Sood said he plans to eventually post a paper online, but only once it has been peer-reviewed and approved for publication.
“I don't want ‘crowd peer review’ or whatever you want to call it,” he said. “It’s just too burdensome and I’d rather have a more formal peer review process.”
Still, skipping the traditional step of data sharing didn’t go over well with some scientists.
“You can’t report the conclusions without providing scientific evidence — or you shouldn’t,” said Natalie Dean, a University of Florida biostatistician.
Criticism #2: The antibody test’s accuracy rates may be shakier than presented.
One of scientists’ biggest concerns is that the researchers were overly confident in their test’s false-positive rate and failed to account for the likely possibility that it could be lower or higher — a potential difference that would dramatically affect the studies’ conclusions.
Tests like these look for antibodies formed by the immune system in response to a past infection, and differ from the nasal- and throat-swab diagnostic tests that spot current infections. Antibodies are usually an indicator of immunity against infectious diseases, but since this virus has only existed for about four months, scientists don’t yet know how long such protection might last.
Nevertheless, antibody tests have been touted as key to identifying who might be safe from reinfection and could help reopen the economy. To increase their availability, the FDA is letting them be sold without checking the accuracy rates advertised by their manufacturers. As a result, only four have “emergency” authorization from the agency and more than 120 others have varying — and unverified — degrees of accuracy.
Both California studies used tests from Premier Biotech, a Minnesota-based company. These tests were used because they were donated and their accuracy claims were independently verified at Stanford, Sood said in an interview last week.
Before being deployed in Northern California, Premier’s test kit was run against a total of 401 samples known to be coronavirus-negative: 371 in the manufacturer’s testing, 30 in Stanford’s testing. Across the two sets of results, Premier’s test reported that 399 of the 401 were negative.
The researchers interpreted this to mean that it most likely had a false-positive rate of 0.5%, according to the report. At the same time, it could also range somewhere between 0.1% and 1.7%, according to the researchers’ “confidence interval,” a statistical term that accounts for a range of possible errors.
That matters because the Santa Clara study found antibodies in 50 of the 3,330 participants, or 1.5%. Since the test’s false-positive rate could be as high as 1.7%, it is possible that many of the so-called positives were not, in fact, positive.
“Literally every single one could be a false positive,” Kilpatrick said. “No one thinks all of them were, but the problem is we can’t actually exclude the possibility.”
That possibility is even harder to rule out in situations when the number of actual infections is low. If only a minority of Santa Clara County residents are infected, the test would have a higher likelihood of turning up false positives.
In their analysis, the researchers adjusted for this range of rates while calculating their infection estimates. But given the small number of samples used to validate the test, coupled with the fact that the test is almost as new as the virus, critics say it’s possible that the true false-positive rates could be even higher than presented. The test also generates a large percentage of false negatives, 20%, with a possible range of up to 28%, according to the combined validation efforts.
“There’s more uncertainty than they’ve accounted for,” Dean said.
The wide range in estimates for infections in Santa Clara County in early April — from 48,000 to 81,000 infections — reflects the difference in accuracy rates calculated for the test across the two times it was validated. Using the manufacturer’s rates to correct for the total, 2.5% of the county was infected. Using Stanford’s, about 4.2% were.
As for the Southern California study, there isn’t yet full data to analyze. But researchers there found antibodies in about 35 out of 836 people, or 4.1% of those tested.
So far, serology tests across the world have produced a wide variety of estimates of the number of true coronavirus infections, with those from the California studies on the lower end. At Wuhan’s Zhongnan Hospital, 2.4% of its 3,600 employees were found to have antibodies. Tests on 500 residents of a German town turned up antibodies in 14% of them. And a study near Boston found that 32% of 200 people had been previously infected.
On Monday, WHO Director-General Tedros Adhanom Ghebreyesus maintained that the prevalence was low, “not more than 2 to 3 percent.”
Even slight increases or decreases in the number of positives matter because, in such a small sample, they could make a big difference in the estimated infections across the population.
Sood said that he and the Stanford team had done their best to adjust for the test’s false positives and false negatives, while acknowledging that they were taking a second look at their confidence intervals. “As new data comes in about these tests, we will update these results,” he said.
None of this means that testing shouldn’t have been done, or that the researchers shouldn’t have published their data.
The error, observers say, was in not being more upfront about how little the numbers could be trusted.
“The fact that they made mistakes in their data analysis does not mean that their substantive conclusions are wrong (or that they’re right),” Gelman said by email. “It just means that there is more uncertainty than was conveyed in the reports and public statements.”
Criticism #3: The Santa Clara County study picked and sorted participants in questionable ways.
Another aspect of the Santa Clara County study that has been flagged as a major problem is how it found participants: Facebook ads.
Spreading the word about tests through social media, the researchers said, helped the study get off the ground quickly and allowed organizers to target people by zip code and demographic characteristics like sex, race, and age. Then they had people drive through three testing sites.
One potential downside of this approach, though, is that since testing is so scarce in the US, the mention of a test may have drawn disproportionate numbers of people who’d had COVID-19 symptoms but weren’t able to get tested. That could have inflated the number of positive results. It’s unclear by how much: The researchers said they collected data about symptoms but didn’t describe how many of the positive testers had symptoms or what the symptoms were.
This recruiting resulted in a group that was markedly different from Santa Clara County’s overall population in a couple ways: Certain zip codes and white women were overrepresented; Latino and Asian people were underrepresented. Given they were Facebook users, the test likely didn’t include people without internet access.
When calculating the estimated infections, the researchers accounted for these differences as well as the test’s accuracy rates in order to try to make their results representative of the county. They didn’t adjust for age, though, even though some of the age groups were also not representative: 5% of participants were over 64, compared with 13% in the county. Sood said that they did not have enough participants across age groups to adjust for age.
All these decisions, among others, influenced the final estimate of infections. When the demographic and geographic differences were adjusted for, the percentage of positive results across the population, 1.5%, nearly doubled.
Kilpatrick believes the researchers did themselves a disservice by not recruiting a more representative group from the get-go. “If the group that did the Stanford study had asked any of the scientists who do these studies all the time, ‘We’re thinking of recruiting on Facebook,’ we’d say, ‘Don’t do it,’” he said.
An ideal way to recruit, Kilpatrick said, would be to use a county database of addresses and send letters to a subset of random addresses, making sure that any one neighborhood isn’t overrepresented. Of course, he acknowledged, there’s always the chance that lots of people wouldn’t respond anyway.
Other serology studies have taken their own approaches to finding participants. In the Boston suburb of Chelsea, researchers pinpricked the fingers of random passersby in Bellingham Square. Starting this week, New York is testing more than 3,000 people in supermarkets across the state.
For the Los Angeles County study, Sood said he and his team went a different way: They enlisted a market research firm with a proprietary database of thousands of emails and phone numbers of county residents. They invited a random subset to participate in a study “about COVID,” but didn’t say it was about testing.
The team set about recruiting people to fulfill quotas for race/ethnicity, age, and so on, based on the county’s demographics. Once a subgroup’s quota was met, they stopped enrolling people. To make sure they were reaching underrepresented groups, the market research firm made follow-up calls to people in those categories.
Participants were then invited to drive through six testing sites on a recent weekend. Staff also went to some respondents’ homes to do testing there.
Even though Sood says they went to great lengths to make the group representative of the population, he acknowledged that there is no “perfect” recruitment strategy.
Of the Santa Clara County study, he said the team had done their best with limited resources. “We still thought it was worthwhile doing it even though we fully recognize our methods were not anywhere close to perfect,” he said. “We still thought it would provide useful information and it would add to the debate about what’s going on.”
Few people would turn down the chance to find out whether they’ve had the coronavirus. But Dean questioned whether, from a public health messaging standpoint, it is helpful to fixate on these infection estimates when they are so preliminary.
No matter how many people may or may not be infected — numbers that scientists won’t be able to pin down for a long time — the real numbers that matter right now, in terms of conveying the threat of the disease, are those of the bodies ending up in ICU beds and funeral homes.
“Either way, we’re ending up with a lot of people being hospitalized and dying,” she said. “Everyone needs to keep that part in mind.”