Maximilian Schreiner
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Read full article about: Zonos can clone your voice and is open source
Zyphra has released Zonos-v0.1, an open source model that turns text into natural-sounding speech and can clone voices using just seconds of audio data. The new model supports five languages - English, Japanese, Chinese, French, and German - and gives users control over speaking speed, pitch, audio quality, and emotional tone. According to Zyphra, the model processes audio faster than real-time when running on an RTX 4090 GPU. Zyphra has made Zonos available in two versions: a pure transformer model and a hybrid model that combines state-space models with transformers. Both versions were trained on approximately 200,000 hours of audio data, primarily in English. Users can try out Zonos through a user-friendly Gradio interface, with easy Docker installation for local use. The model is also accessible through the Zyphra Playground or via API for those who prefer cloud-based solutions.
Today, we're excited to announce a beta release of Zonos, a highly expressive TTS model with high fidelity voice cloning.
We release both transformer and SSM-hybrid models under an Apache 2.0 license.
Zonos performs well vs leading TTS providers in quality and expressiveness. pic.twitter.com/jaliZNJecm
- Zyphra (@ZyphraAI) February 10, 2025