A new open source voice model called Kokoro just landed on HuggingFace, and early tests show it can generate voices that rival commercial services like Eleven Labs. The model packs 82 million parameters under the hood, and is on the first place in the TTS Spaces Arena. The model is trained on less than 100 hours of audio data, supporting just American and British English for now. Users can currently choose from 10 different voices. While the model shows promise, it does have its limitations. Unlike some commercial alternatives, it can't clone voices, and there aren't any plans to add support for other languages yet. For developers interested in using Kokoro, the inference code is available under an MIT license, while the model itself uses an Apache 2.0 license.
Now that we have amazing open source TTS with fast inference, what are you building?https://t.co/XTsRwtiq0Q pic.twitter.com/R7HrtB1LeJ
- Victor M (@victormustar) January 13, 2025