NVIDIA NeMo, an open-source conversational AI toolkit, has released Parakeet, a set of automatic speech recognition (ASR) models. Developed in partnership with Suno.ai, the four Parakeet models, ranging from 0.6 to 1.1 billion parameters, can transcribe spoken English and are available for commercial use under the CC BY 4.0 license. The models were trained on 64,000 hours of audio data covering different accents, ranges, and sound conditions. According to the developers, the models are robust to non-speech segments such as music and silence, and outperform OpenAI's Whisper v3 in benchmarks. They also offer user-friendly integration into projects through pre-trained control points. A demo of the 1.1 billion parameter model is available here.
Ad
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Sources
News, tests and reports about VR, AR and MIXED Reality.
Meta Quest's new Twitch app is a disappointment
A Walk through "Uncanny Alley" - VR Immersive Theater Production
Metro Awakening VR Preview: Survival horror with VR hit potential
MIXED-NEWS.com
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.