Content
summary Summary

The AI company Play.ht markets its product with an unusual idea: In an AI-generated podcast, Apple co-founder Steve Jobs, who died in 2011, speaks with podcast star Joe Rogan.

Synthetic voices have made enormous progress in recent years thanks to machine learning: the choppy robotic stutter has long since given way to fluent speech that is increasingly dynamic in intonation and thus more emotional.

Voices and script are generated with AI

The company Play.ht demonstrates this in a new podcast project generated entirely with AI. Play.ht sells services for machine voices in various quality levels and formats. For example, a Play.ht service automatically reads blog articles in a more or less natural-sounding voice.

"At Play.ht, We believe in a future where all content creation will be generated by AI but guided by humans, and the most creative work will depend on the human's ability to articulate their desired creation to the machine," the company writes.

Ad
Ad

The voices in the podcast are rendered using Play.ht's "Ultra-realistic Voices" feature. According to the company, this is "the latest generation" of machine voices that are "almost indistinguishable" from human voices. Make your own picture.

To train the voice generators, the company used audio data available online from Rogan and Jobs. Joe Rogan in particular offers a large amount of training material with his numerous video podcasts. In the past, there have already been quite successful attempts to replace Rogan with AI-generated content.

Play.ht generated the podcast script using fine-tuned language models. For the Steve Jobs episode, the company trained a language mode with the Jobs biography and also incorporated "all recordings that could be found online" into the training.

For the future, Play.ht is collecting ideas from users for more unusual AI-generated podcasts. At the top of the list is currently a podcast between Buddha and Einstein.

Play.ht's podcast project is just one example that demonstrates the progress of synthetic voices and AI audio in general. Similar to image generators like DALL-E 2 or Midjourney, AI-generated audio could transform labor markets. Recently, Meta researchers introduced a new AI system that can generate audio based on text.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • The AI company Play.ht specializes in synthetic voices.
  • A new service is designed to generate particularly high-quality voices that are almost indistinguishable from human voices.
  • Play.ht demonstrates this with a fully AI-generated podcast featuring Joe Rogan and Steve Jobs.
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.