For years, experts have been debating the issues raised by deepfake technology. Congress and the military have discussed how to prepare the country for the fraud that is likely as deepfakes become more realistic. Now, new advancement by Amazon has taken that technology to the next level.
At Amazon’s re:MARS conference in June, Rohit Prasad — head scientist and vice president of Alexa AI — demonstrated how Amazon scientists could recreate any voice based on just a one-minute audio sample.
Amazon’s original Alexa voice debuted in November 2014. In its initial years, the voice was heavily critiqued. In 2017, VentureBeat wrote, “Alexa is pretty smart, but no matter what the A.I.-powered assistant talks about, there’s no getting around its relatively flat and monotone voice.”
Now, however, text-to-speech (TTS) technology has made major advancements towards more realistic — some argue, too realistic — speech.
As Fast Company reports, in an effort to create more expressive and natural-sounding voices, Amazon, Google, Microsoft, Baidu, and other major players in text-to-speech have all in recent years adopted some form of “neural TTS.” Neural TTS uses deep-learning neural networks trained on human speech and can convert any text input into human-sounding speech. Neural systems are capable of learning “not just pronunciation but also patterns of rhythm, stress, and intonation.”
Amazon hasn’t announced when this new voice-cloning capability will be available to developers and the public.
Governments everywhere are struggling to figure out how to adapt to these latest advancements. In America, most deepfakes are considered protected free speech, at least for now. Still, some states have attempted to take action against nefarious uses of the technology. In New York, commercial use of a performer’s synthetic likeness without consent is banned for 40 years after the performer’s death, according to CBS News. California and Texas prohibit deceptive political deepfakes before elections.
While concerns about realistic voice-cloning technology are rampant, developers of the technology, like Amazon, are optimistic. In an email to Fast Company, an Amazon spokesperson wrote: “Personalizing Alexa’s voice is a highly desired feature by our customers, who could use this technology to create many delightful experiences. We are working on improving the fundamental science that we demonstrated at re:MARS and are exploring use cases that will delight our customers, with necessary guardrails to avoid any potential misuse.”