NVIDIA NeMo, an open-source conversational AI toolkit, has released Parakeet, a set of automatic speech recognition (ASR) models. Developed in partnership with Suno.ai, the four Parakeet models, ranging from 0.6 to 1.1 billion parameters, can transcribe spoken English and are available for commercial use under the CC BY 4.0 license. The models were trained on 64,000 hours of audio data covering different accents, ranges, and sound conditions. According to the developers, the models are robust to non-speech segments such as music and silence, and outperform OpenAI's Whisper v3 in benchmarks. They also offer user-friendly integration into projects through pre-trained control points. A demo of the 1.1 billion parameter model is available here.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Sources
News, tests and reports about VR, AR and MIXED Reality.
HTC Vive launches handy but pricey 5-piece charging station for your Ultimate Trackers
Meta reportedly developing AI-powered earbuds with built-in cameras
Looking Glass launches new 16-inch and 32-inch 3D displays
MIXED-NEWS.com
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.