More than half of Americans could be identified from just a sample of their DNA and some sleuthing in public genealogy databases — using methods similar to those used by cops to catch the man they believe is the Golden State Killer.
That’s the conclusion of a new study looking at a database of 1.28 million DNA profiles held by the company MyHeritage, based in Israel. About half of the US population — 60% of those with white European ancestry — have a third cousin or closer relative in the database, the researchers calculated.
That means it would be relatively easy, they say, for investigators to match a DNA sample left at a crime scene to someone in the database, then use census records and other genealogy tools to construct family trees that lead to the culprit.
As more people upload their DNA profiles to genealogy databases, it will only get easier to identify suspects from partial matches to crime-scene DNA. And Yaniv Erlich of Columbia University, who led the research, expects that genealogists will become more skilled at tracking down their quarry.
“I also think that the methods and the tactics will get better and better,” said Erlich, who is chief science officer with MyHeritage. “People will find different tricks.”
Erlich’s results are broadly similar to earlier back-of-the-envelope calculations made by geneticists Graham Coop and Michael Edge of the University of California, Davis, for GEDmatch, a database that contains the DNA profiles of about a million people and was used to help identify the suspected Golden State Killer.
But the genealogist who has already cracked a dozen similar cases using GEDmatch told BuzzFeed News that, in practice, it is harder to identify suspects than the researchers assume.
“It’s very difficult, because every case has unique challenges to overcome,” said CeCe Moore, who is working with the company Parabon NanoLabs to help cops solve cold-case murders and rapes.
Identifying someone from a partial match between their DNA means building family trees that might link a third cousin, for example, to the target. Because family trees contain many branches, it is laborious work. And complications such as adoptions, misunderstandings over who is actually the biological father of a child, or recent immigration from countries that don’t have reliable records of family history can make it impossible to make an identification, Moore said.
Still, in some cases, a target can be identified in just a few hours. In their research paper, published today in Science, Erlich and his colleagues described how they uploaded to GEDmatch the DNA profile of a woman who had previously donated her DNA for the 1000 Genomes Project, an international effort to study human genetic variation. The researchers found that they were able find her within a day by constructing family trees from her closest relatives in GEDmatch.
In their paper, they say this highlights a risk that people who donated their DNA anonymously for medical research could have their identities exposed.
“The fact that you can do it in a day is fairly revealing,” Edge told BuzzFeed News.
Erlich and his colleagues have a suggestion to protect people’s privacy in the new era of forensic genetic genealogy: They want companies providing genetic tests to apply a secure digital signature to each DNA profile they generate. These digital signatures could then be used to limit the use of genetic genealogy by cops — and anyone else who might want to try and identify someone from a DNA profile.
MyHeritage, like most genetic testing companies, opposes the use of its database by law enforcement without a court order. However, because MyHeritage allows customers to upload profiles produced by other companies into its database, this “digital signature” idea would need buy-in from other firms. The three largest — 23andMe, Ancestry, and Family Tree DNA — declined to comment on Erlich’s proposal.
“We don’t have enough information on the solution proposed by the researchers to evaluate it at this point,” 23andMe spokesperson Andy Kill told BuzzFeed News by email.
GEDmatch, meanwhile, modified its terms of service after the Golden State Killer arrest to warn users that their profiles could be searched “by third parties such as law enforcement agencies to identify the perpetrator of a crime, or to identify remains.”
If cops also had access to the technology, a digital signature scheme could in theory allow GEDmatch to distinguish between criminal investigations, which it allows, and nefarious use by other people to breach people’s privacy, for example by “outing” research volunteers.
GEDmatch cofounder Curtis Rogers told BuzzFeed News by email that the digital signatures idea “merits serious consideration.” He added: “We at GEDmatch are very concerned about the proper use of genealogical information.”
Paul Holes, a retired investigator with the Contra Costa County District Attorney’s Office who led the team that snared the suspected Golden State Killer, said that cops would be concerned about any privacy controls that might reveal their investigative activities.
“There’s a reason why we did not communicate with GEDmatch about the Golden State Killer case,” Holes told BuzzFeed News. “If we had a leak, you have a very dangerous offender who potentially could decide to skip town, or take a hostage.”
Erlich’s calculations also show that genetic genealogy involves some racial biases that run counter to ways in which criminal investigations often play out. Because black Americans are disproportionately arrested and imprisoned, their profiles are overrepresented in police DNA databases. This makes it more likely that a match will be found to a crime-scene sample for a black suspect.
But GEDmatch and other genealogy databases contain a disproportionate number of profiles from white Americans. Erlich and his colleagues found that people with a mainly North European ancestry were 30% more likely to have a third cousin or closer in MyHeritage’s database than someone whose ancestry is largely African. This means that white suspects would be easier to find using genealogical methods.
Already, police are looking at the success of using genetic genealogy on decades-old cold cases and realizing that the same methods can be used in current investigations. In July, Moore and Parabon used the approach to identify 31-year-old Spencer Glen Monnett as the suspect in the rape of a 79-year-old woman in St. George, Utah, that happened in April of this year.
“A lot of forward-thinking investigators are thinking, ‘Why not get this guy off the streets now?’” Ellen Greytak, Parabon’s director of bioinformatics, told BuzzFeed News.