One day a few years ago, while talking to a journalist in her office, Harvard computer science professor Latanya Sweeney typed her name into Google’s search bar to pull up a study. As results filled in, the page also brought up an alarming advertisement: for a service that would search for arrests against her name.
The journalist perked up. Forget the paper, he told her, tell me about that arrest. That’s impossible, Sweeney replied — she had never been arrested.
Sweeney decided to get to the bottom of the phenomenon. Armed with the names of 2,000 real people, she searched on Google.com and Reuters.com and noted how the ads delivered varied depending on the person’s race. In a 2013 study, she observed that searches for typically African American names suggested an arrest 25% of the time.
Today, these sorts of Google searches no longer result in arrest ads. But this algorithmic discrimination is likely to show up in all sorts of online services, according to a study published in Science on Thursday.
The authors raise the possibility that language algorithms — such as those being developed to power chat bots, search engines, and online translators — could inadvertently teach themselves to be sexist and racist simply by studying the way people use words.
The results suggest that the bots were absorbing hints of human feelings — and failings, the researchers say.
“At the fundamental level these models are carrying the bias within them,” study author Aylin Caliskan, a postdoctoral researcher at Princeton, told BuzzFeed News.
Widely used algorithms that screen resumes, for example, could be ranking a woman programmer’s application lower than a man’s. In 2015, a group from Carnegie Mellon University observed that Google was more likely to display ads for high-paying jobs if the algorithm believed the person visiting the site was a man.
Caliskan and her colleagues focused on two kinds of popular “word-embedders” — algorithms that translate words into numbers for computers to understand. The researchers trained each bot on a different dataset of the English language: the “common crawl,” a database of language scraped from the web, containing about 840 billion words; and a Google News database containing some 100 billion words.
The study found that these simple word associations could give the bots knowledge about how people judge objects: Flowers and musical instruments, for example, were deemed more pleasant than guns and bugs.
The researchers also made their own version of a psychology test for people that seeks to reveal hidden biases.
The algorithms more often linked European American names, such as Adam, Paul, Amanda, and Megan, with feel-good words like “cheer,” “pleasure,” and “miracle” than it did for African American names like Jerome, Lavon, Latisha, and Shaniqua. And conversely, the algorithm matched words like “abuse” and “murder” more strongly with the African American names than the European American ones.
Female names like Amy, Lisa, and Ann tended to be linked to domestic words like “home” “children” and “marriage,” whereas male names like John, Paul, and Mike were associated with job terms like “salary”, “management”, and “professional.”
The software also linked male descriptors (brother, father, son) with scientific words (physics, chemistry, NASA), and female descriptors (mother, sister, aunt) to art terms (poetry, art, drama).
“We’ve used machine learning to show that this stuff is in our language,” study author Joanna Bryson, professor of artificial intelligence at the University of Bath in the UK, told BuzzFeed News.
That algorithms are deeply biased is not a new idea. Researchers who study ethics in AI have been arguing for a decade to program “fairness” into algorithms. As AI gets smarter, they say, software could make for a society that is less fair and less just.
Such signs are already here: Last year, when Amazon launched its same-day delivery service in major markets, predominantly black neighborhoods were excluded “to varying degrees” in six cities, Bloomberg found. According to analysis by ProPublica, the test-prep seller Princeton Review was twice as likely to charge Asians a higher price than non-Asians. ProPublica also showed that software used by courts to predict future criminals were biased against black Americans. On social media, people have been regularly calling out racially biased search results.
“It's vital that we have a better understanding of where bias creeps in before these systems are applied in areas like criminal justice, employment and health,” Kate Crawford, principal researcher at Microsoft Research, told BuzzFeed News in an email. Last year, Crawford worked with the Obama administration to run workshops on the social impact of AI, and this year co-founded the AI Now Initiative to examine how to ethically deploy such technology.
While the Science paper is expected in some ways, the results show how systemic the problem of bias is, Sorelle Friedler, associate professor of computer science at Haverford College, told BuzzFeed News.
“I see it as an important to scientifically validate them so that we can build on this — so that we can say, now that we know this is happening, what do we do about this?” she said.
Because these kinds of language-learning bots will soon be common, it’s likely that most of us will routinely encounter such bias, Hal Daumé III, a professor of computer science at the University of Maryland, told BuzzFeed News.
Some researchers at Google are finding ways to make decision-making in AI more transparent, according to company spokesperson Charina Choi. “We’re quite sensitive to the effect of algorithms, and spend a lot of time rigorously testing our products with users’ feedback,” Choi wrote in an email to BuzzFeed News.
Facebook, which is developing chat programs powered by AI, declined to comment. But the Science study showed how these biases crop up in at least one popular service: Google Translate.
When translating from Turkish, which does not have gendered pronouns, to English, which does, the service associates male pronouns with the word “doctor” and female pronouns with the word “nurse.”
When translating into Spanish, English, Portuguese, Russian, German, and French, the tool brought up similar results, the study found.
“I think almost any company that does natural language processing-like tasks is going to be using word-embedding in some form or another,” Daumé said. “I don’t see any reason to believe that [the study’s word-embedder] is any more sexist or racist than any of the others.”