Ad
Skip to content

Zonos can clone your voice and is open source

Zyphra has released Zonos-v0.1, an open source model that turns text into natural-sounding speech and can clone voices using just seconds of audio data. The new model supports five languages - English, Japanese, Chinese, French, and German - and gives users control over speaking speed, pitch, audio quality, and emotional tone. According to Zyphra, the model processes audio faster than real-time when running on an RTX 4090 GPU. Zyphra has made Zonos available in two versions: a pure transformer model and a hybrid model that combines state-space models with transformers. Both versions were trained on approximately 200,000 hours of audio data, primarily in English. Users can try out Zonos through a user-friendly Gradio interface, with easy Docker installation for local use. The model is also accessible through the Zyphra Playground or via API for those who prefer cloud-based solutions.

- Zyphra (@ZyphraAI) February 10, 2025

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

  • Over 20 percent launch discount.
  • Read without distractions – no Google ads.
  • Access to comments and community discussions.
  • Weekly AI newsletter.
  • 6 times a year: “AI Radar” – deep dives on key AI topics.
  • Up to 25 % off on KI Pro online events.
  • Access to our full ten-year archive.
  • Get the latest AI news from The Decoder.
Subscribe to The Decoder