Kokoro's open-source TTS model rivals the best with a lean 82 million parameters

Jan 14, 2025

A new open source voice model called Kokoro just landed on HuggingFace, and early tests show it can generate voices that rival commercial services like Eleven Labs. The model packs 82 million parameters under the hood, and is on the first place in the TTS Spaces Arena. The model is trained on less than 100 hours of audio data, supporting just American and British English for now. Users can currently choose from 10 different voices. While the model shows promise, it does have its limitations. Unlike some commercial alternatives, it can't clone voices, and there aren't any plans to add support for other languages yet. For developers interested in using Kokoro, the inference code is available under an MIT license, while the model itself uses an Apache 2.0 license.

Now that we have amazing open source TTS with fast inference, what are you building?https://t.co/XTsRwtiq0Q pic.twitter.com/R7HrtB1LeJ

- Victor M (@victormustar) January 13, 2025

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

Kokoro's open-source TTS model rivals the best with a lean 82 million parameters

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.