Balasubramaniyan says voice AI services need to offer security on par with that of other companies that store personal data, like financial or medical information.
“You have to ask the company, ‘how is my AI voice going to be stored? Are you actually storing my recordings? Are you storing it encrypted? Who has access to it?’” Balasubramaniyan says. “It is apart of me. It is my intimate self. I need to protect it just as well.”
Podcastle says the voice models are end-to-end encrypted and that the company doesn’t keep any recordings after creating the model. Only the account holder who recorded the voice clips can access them. Podcastle also doesn’t allow other audio to be uploaded or analyzed on Revoice. In fact, the person creating a copy of their voice has to record the lines of prewritten text directly into Revoice’s app. They can’t just upload a pre-recorded file.
“You are the one giving permission and creating the content,” Podcastle’s Yeritsyan says. “Whether it’s artificial or original, if this is not a deepfaked voice, it’s this person’s voice and he put it out there. I don’t see issues.”
Podcastle is hoping that being able to render audio in only a consenting person’s cloned voice would disincentivize people from making themselves say anything too horrible. Currently, the service doesn’t have any content moderation or restrictions on specific words or phrases. Yeritsyan says it is up to whatever service or outlet publishes the audio—like Spotify, Apple Podcasts, or YouTube—to police the content that gets pushed onto their platforms.
“There are huge moderation teams on any social platforms or any streaming platform,” Yeritsyan says. “So that’s their job to not let anyone else use the fake voice and create something stupid or something not ethical and publish it there.”
Even if the very thorny issue of voice deepfakes and nonconsensual AI clones is addressed, it’s still unclear whether people will accept a computerized clone as an acceptable stand-in for a human.
At the end of March, the comedian Drew Carey used another voice AI service, Eleven Labs, to release a whole episode of a radio show that was read by his voice clone. For the most part, people hate it. Podcasting is an intimate medium, and the distinct human connection you feel when listening to people have a conversation or tell stories is easily lost when the robots step to the microphone.
But what happens when the technology advances to the point that you can’t tell the difference? Does it matter that it’s not really your favorite podcaster in your ear? Cloned AI speech has a ways to go before it’s indistinguishable from human speech, but it’s surely catching up quickly. Just a year ago, AI-generated images looked cartoonish, and now they’re realistic enough to fool millions into thinking the Pope had some kick-ass new outerwear. It’s easy to imagine AI-generated audio will have a similar trajectory.
There’s also another very human trait driving interest in these AI-powered tools: laziness. AI voice tech—assuming it gets to the point where it can accurately mimic real voices—will make it easy to do quick edits or retakes without having to get the host back into a studio.
“Ultimately, the creator economy is going to win,” Balasubramaniyan says. “No matter how much we think about the ethical implications, it’s going to win out because you’ve just made people’s lives simple.”