The AI company Play.ht markets its product with an unusual idea: In an AI-generated podcast, Apple co-founder Steve Jobs, who died in 2011, speaks with podcast star Joe Rogan.
Synthetic voices have made enormous progress in recent years thanks to machine learning: the choppy robotic stutter has long since given way to fluent speech that is increasingly dynamic in intonation and thus more emotional.
Voices and script are generated with AI
The company Play.ht demonstrates this in a new podcast project generated entirely with AI. Play.ht sells services for machine voices in various quality levels and formats. For example, a Play.ht service automatically reads blog articles in a more or less natural-sounding voice.
“At Play.ht, We believe in a future where all content creation will be generated by AI but guided by humans, and the most creative work will depend on the human’s ability to articulate their desired creation to the machine,” the company writes.
The voices in the podcast are rendered using Play.ht’s “Ultra-realistic Voices” feature. According to the company, this is “the latest generation” of machine voices that are “almost indistinguishable” from human voices. Make your own picture.
To train the voice generators, the company used audio data available online from Rogan and Jobs. Joe Rogan in particular offers a large amount of training material with his numerous video podcasts. In the past, there have already been quite successful attempts to replace Rogan with AI-generated content.
Play.ht generated the podcast script using fine-tuned language models. For the Steve Jobs episode, the company trained a language mode with the Jobs biography and also incorporated “all recordings that could be found online” into the training.
For the future, Play.ht is collecting ideas from users for more unusual AI-generated podcasts. At the top of the list is currently a podcast between Buddha and Einstein.
Play.ht’s podcast project is just one example that demonstrates the progress of synthetic voices and AI audio in general. Similar to image generators like DALL-E 2 or Midjourney, AI-generated audio could transform labor markets. Recently, Meta researchers introduced a new AI system that can generate audio based on text.