A new open source voice model called Kokoro just landed on HuggingFace, and early tests show it can generate voices that rival commercial services like Eleven Labs. The model packs 82 million parameters under the hood, and is on the first place in the TTS Spaces Arena. The model is trained on less than 100 hours of audio data, supporting just American and British English for now. Users can currently choose from 10 different voices. While the model shows promise, it does have its limitations. Unlike some commercial alternatives, it can't clone voices, and there aren't any plans to add support for other languages yet. For developers interested in using Kokoro, the inference code is available under an MIT license, while the model itself uses an Apache 2.0 license.

Ad
Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.