NVIDIA NeMo, an open-source conversational AI toolkit, has released Parakeet, a set of automatic speech recognition (ASR) models. Developed in partnership with Suno.ai, the four Parakeet models, ranging from 0.6 to 1.1 billion parameters, can transcribe spoken English and are available for commercial use under the CC BY 4.0 license. The models were trained on 64,000 hours of audio data covering different accents, ranges, and sound conditions. According to the developers, the models are robust to non-speech segments such as music and silence, and outperform OpenAI's Whisper v3 in benchmarks. They also offer user-friendly integration into projects through pre-trained control points. A demo of the 1.1 billion parameter model is available here.

Ad
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.