Metas Voicebox is Stable Diffusion for speech
Meta's Voicebox is like Stable Diffusion for voices: The generative AI model synthesizes speech from text and can be used for various speech tasks. Voicebox generates realistic and expressive voices and allows attributes such as tone, style or accent to be adopted from audio files.
According to Meta, Voicebox outperforms existing speech synthesis models such as Microsoft's VALL-E in terms of speech quality and naturalness. "As the first versatile, efficient model that successfully performs task generalization, we believe Voicebox could usher in a new era of generative AI for speech.," Meta said. Due to the risk of misuse, the team has also developed a system for recognizing synthesized speech and has no plans to release Voicebox for the time being.
Video: Meta
AI News Without the Hype – Curated by Humans
As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.
Subscribe nowAI news without the hype
Curated by humans.
- Over 20 percent launch discount.
- Read without distractions – no Google ads.
- Access to comments and community discussions.
- Weekly AI newsletter.
- 6 times a year: “AI Radar” – deep dives on key AI topics.
- Up to 25 % off on KI Pro online events.
- Access to our full ten-year archive.
- Get the latest AI news from The Decoder.