Meta's Voicebox is like Stable Diffusion for voices: The generative AI model synthesizes speech from text and can be used for various speech tasks. Voicebox generates realistic and expressive voices and allows attributes such as tone, style or accent to be adopted from audio files.

Ad

According to Meta, Voicebox outperforms existing speech synthesis models such as Microsoft's VALL-E in terms of speech quality and naturalness. "As the first versatile, efficient model that successfully performs task generalization, we believe Voicebox could usher in a new era of generative AI for speech.," Meta said. Due to the risk of misuse, the team has also developed a system for recognizing synthesized speech and has no plans to release Voicebox for the time being.

Video: Meta

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.