Section Branding
Header Content
Send in the clones: Using artificial intelligence to digitally replicate human voices
Primary Content
Thanks to advances in artificial intelligence, it's never been easier or more affordable to make a perfect facsimile of a human voice: a celebrity, a world leader or even a public-radio reporter.
Transcript
MARY LOUISE KELLY, HOST:
Talking machines like Siri and Alexa or your bank's automated customer service line - they are sounding awfully human these days. That's thanks to advances in artificial intelligence, AI. In some cases, it's becoming hard to distinguish synthetic voices from real ones. Chloe Veltman of member station KQED recently got her voice cloned and brings us this story.
CHLOE VELTMAN, BYLINE: The science behind making machines talk just like humans is very complex because our speech patterns are so nuanced. It's taken well over 200 years for synthetic voices to go from the first speaking machine invented by Wolfgang von Kempelen...
(SOUNDBITE OF ARCHIVED RECORDING)
AUTOMATED VOICE #1: Na, na, na, na (ph).
VELTMAN: ...To a Samuel L. Jackson voice clone delivering the weather reports on Alexa today.
(SOUNDBITE OF ARCHIVED RECORDING)
AUTOMATED VOICE #2: (Imitating Samuel L. Jackson) Tonight's forecast calls for showers with a low of 52 degrees.
VELTMAN: So it's quite a shock to find out just how easy it is to order up a fake voice from a speech synthesis company like the one I worked with, San Francisco Bay Area-based Speech Morphing. For a basic conversational build, you record yourself saying a bunch of scripted lines into a mic for roughly an hour, and that's about it.
Here, the explosion of mirth drowned him out. That's what Carnegie did. I'd like to be buried under the Yankee Stadium with JFK.
DANIEL RADZINSKI: All right - no the before Yankee Stadium.
VELTMAN: Speech Morphing general manager Daniel Radzinski listens in remotely as I say hundreds of phrases from the script he sent me in advance.
I'd like to be buried under Yankee Stadium with JFK.
The lines aren't as random as they seem. Radzinski says he chooses utterances that will produce a wide enough variety of sounds across a range of emotions to feed a neural network-based AI training system. It essentially teaches itself the specific patterns of a person's speech.
RADZINSKI: It will come out in the right prosody, if you will.
VELTMAN: After we're done, I sent him the recordings. From there, the Speech Morphing team breaks down and analyzes my utterances and then builds the model for the AI to learn from. It all takes less than a week. Speech Morphing founder and CEO Fathy Yassa says the possibilities for the Chloe Veltman voice clone - or Chloney, as I've affectionately come to call my robot self - are almost limitless.
FATHY YASSA: So we can make you apologetic. We can make you promotional. We can make you like acting in a theater. We can make you sing eventually - not yet there.
VELTMAN: The global speech and voice recognition industry is worth tens of billions of dollars and is growing fast. Its uses are evident. The technology has given actor Val Kilmer, who lost his voice owing to throat cancer...
(SOUNDBITE OF ARCHIVED RECORDING)
VAL KILMER: My voice as I knew it was taken away from me.
VELTMAN: ...The chance to reclaim something approaching his former vocal powers.
(SOUNDBITE OF ARCHIVED RECORDING)
KILMER: But now I can express myself again.
VELTMAN: It's enabled film directors and game designers to develop characters without the need to have live voice talents on hand, like in the movie "Roadrunner," where an AI was trained on Anthony Bourdain's extensive archive of media appearances to create a digital double of the late chef and TV personality's voice.
(SOUNDBITE OF FILM, "ROADRUNNER")
AUTOMATED VOICE #3: (Imitating Anthony Bourdain) You're probably going to find out about it anyway, so here's a little preemptive truth telling. There's no happy ending.
VELTMAN: As pitch-perfect as this might be, it's been controversial. Some people raised ethical concerns about putting words into Bourdain's mouth that he never actually said while he was alive. A cloned version of Barack Obama's voice warning people about the dangers of fake news hammers the point home. Sometimes we have cause to be wary of machines that sound too much like us.
(SOUNDBITE OF ARCHIVED RECORDING)
AUTOMATED VOICE #4: (Imitating Barack Obama) We're entering an era in which our enemies can make it look like anyone is saying anything at any point in time even if they would never say those things.
VELTMAN: And sometimes we don't even want machines to sound too human because it creeps us out. User experience and voice designer Amy Jimenez Marquez led the Amazon Alexa personality experience design team for four years. She says if you want a digital assistant to do children's story time, a more human-sounding voice might be great, as that's more approachable.
AMY JIMENEZ MARQUEZ: Maybe not something that actually breathes because that's a little bit creepy but, you know, a little more human.
VELTMAN: But for a machine that performs basic tasks like, say, a voice-activated refrigerator, maybe less human is the way to go.
JIMENEZ MARQUEZ: So having something a little more robotic. And you can even create, like, a tinny voice that sounds like an actual robot that's cute. That would be more appropriate for a refrigerator.
AUTOMATED VOICE #5: (Imitating Chloe Veltman) Happy birthday to you. Happy birthday to you. Happy birthday, dear Chloney. Happy birthday to you.
VELTMAN: No, this isn't a talking fridge. Allow me to introduce Chloney, my digital voice double. Chloney may not be able to sing "Happy Birthday" yet, but she can read out news stories I didn't even report myself, like this one ripped from an AP newswire.
AUTOMATED VOICE #5: (Imitating Chloe Veltman) September wasn't exactly the robust month for hiring that many had expected and hoped for.
VELTMAN: She can even do it in Spanish.
AUTOMATED VOICE #5: (Imitating Chloe Veltman, speaking Spanish).
VELTMAN: Chloney sounds a lot like me. Let's hope she doesn't put me out of a job anytime soon.
For NPR News, I'm Chloe Veltman. Or am I?
(SOUNDBITE OF GOTH BABE SONG, "SUNNNN") Transcript provided by NPR, Copyright NPR.