I Used AI To Clone My Voice And Trick My Mom Into Thinking It Was Me
You can watch our journey into the terrifying future of fake news on BuzzFeed News' Follow This series on Netflix.
In January a man named Aviv Ovadya scared the shit out of me. I’d arranged to chat with him about the future of disinformation expecting a sober prediction of the coming years as incrementally worse. But Ovadya painted a far bleaker picture — a future in which an array of easy-to-use and seamless technology would democratize the ability to manipulate perception and falsify reality. What happens, he mused, “when anyone can make it appear as if anything has happened, regardless of whether or not it did?"
Ovadya told me about "reality apathy," "human puppets," and "the Infocalypse." It was terrifying — more so because early versions of some of the dystopian technology we discussed is already here; some of it is even available to the public.
Which is how I ended up creating an AI-rendered digital recreation of my voice that was so convincing it fooled the person who arguably knows my voice better than anyone: my mom.
For more on this story, watch the new BuzzFeed News series Follow This on Netflix.
To do it, I used Lyrebird, a free software for creating "vocal avatars." Lyrebird analyzes the cadence of your speech and the way you pronounce vowels and consonants to create a realistic digital copy of your speech patterns.
The process requires you to read an assortment of mundane phrases like “watching the parody was great entertainment” or “Eric took his vitamins and continued his exercise regime” while your voice is recorded.
Once you’ve input enough examples, Lyrebird’s AI creates a copy of your digital voice. Then you can type anything into a text box and hear it read aloud in your voice.
To demonstrate its AI, Lyrebird used its technology create a digital copy of Donald Trump's voice.
It only took me about an hour of reading input phrases (about 60) to get a decent rendering of my voice. The first iteration was hardly perfect; it was monotone and robotic — but it sounded like (a robotic) me.
Still, basic elements like pitch and vowel sounds were accurate enough to elicit an "omfg" from my editor.
As I fed Lyrebird more recordings of my real voice, the digital copy of it got better. Specific bits — like my vocal cadence — improved, though the overall quality was still grainy.
Discussing the recording with my editor, we realized that decent was likely good enough to dupe an unsuspecting person into thinking my "vocal avatar" was me. All I'd need to do was suggest that I was calling from an area with poor cell reception.
We decided that unsuspecting person should be my mom. She is intimately familiar with my voice — a difficult target.
To prepare, I typed out a bunch of phrases I was likely to use in a short call. I'd have to keep our chat brief — a long conversation would likely expose more flaws in my digital voice and give me away. I was planning to meet up with my mom for dinner that night, so I figured I'd call to quickly check in to confirm our plans. Then I'd feign losing service and say goodbye.
It worked. Seamlessly. Here's how it went down:
Needless to say, I was quite surprised.
Not only did my Lyrebird voice fool my mom, she had a hard time believing it was created by AI. "I didn't know something like that was possible," she told me later on, reminding me that audio and video manipulation to her sounds more like science fiction than reality. "I never doubted for a second that it was you."
Lyrebird's technology is far from perfect, but it's evolving fast enough that it doesn't really matter. It's going to get better and it's already good enough to fool my mom. Which is scary, something Lyrebird itself is keenly aware of.
"Imagine that we had decided not to release this technology at all," the company explains on an ethics page on its website. "Others would develop it and who knows if their intentions would be as sincere as ours: they could, for example, only sell the technology to a specific company or an ill-intentioned organization. By contrast, we are making the technology available to anyone and we are introducing it incrementally so that society can adapt to it, leverage its positive aspects for good, while preventing potentially negative applications."
But Lyrebird is far from the only company playing in the precarious realm of audio and video manipulation. To understand the terrifying future of fake news, I embarked on a reporting quest that led me deep into the world of human puppets, AI, deepfakes, and digital harassment. ●