Nvidia's latest open-source speech recognition models beat OpenAI's Whisper v3

Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.

Profile

E-Mail

NVIDIA NeMo, an open-source conversational AI toolkit, has released Parakeet, a set of automatic speech recognition (ASR) models. Developed in partnership with Suno.ai, the four Parakeet models, ranging from 0.6 to 1.1 billion parameters, can transcribe spoken English and are available for commercial use under the CC BY 4.0 license. The models were trained on 64,000 hours of audio data covering different accents, ranges, and sound conditions. According to the developers, the models are robust to non-speech segments such as music and silence, and outperform OpenAI's Whisper v3 in benchmarks. They also offer user-friendly integration into projects through pre-trained control points. A demo of the 1.1 billion parameter model is available here.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:

Bank transfer

Sources

Nvidia NEMO HuggingFace

Matthias Bastian

Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.

Profile

E-Mail