Nvidia's latest open-source speech recognition models beat OpenAI's Whisper v3
NVIDIA NeMo, an open-source conversational AI toolkit, has released Parakeet, a set of automatic speech recognition (ASR) models. Developed in partnership with Suno.ai, the four Parakeet models, ranging from 0.6 to 1.1 billion parameters, can transcribe spoken English and are available for commercial use under the CC BY 4.0 license. The models were trained on 64,000 hours of audio data covering different accents, ranges, and sound conditions. According to the developers, the models are robust to non-speech segments such as music and silence, and outperform OpenAI's Whisper v3 in benchmarks. They also offer user-friendly integration into projects through pre-trained control points. A demo of the 1.1 billion parameter model is available here.
AI News Without the Hype – Curated by Humans
Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.
Subscribe nowAI news without the hype
Curated by humans.
- More than 16% discount.
- Read without distractions – no Google ads.
- Access to comments and community discussions.
- Weekly AI newsletter.
- 6 times a year: “AI Radar” – deep dives on key AI topics.
- Up to 25 % off on KI Pro online events.
- Access to our full ten-year archive.
- Get the latest AI news from The Decoder.