Meta's Fundamental AI Research (FAIR) team has introduced Omnilingual ASR, an automatic speech recognition system that can transcribe spoken language in more than 1,600 languages.
Until now, most speech recognition tools focused on a few hundred well-resourced languages with plenty of transcribed audio. That left thousands of languages - out of the more than 7,000 spoken worldwide - with little or no AI support.
Omnilingual ASR is built to close that gap. Meta says 500 of the 1,600 supported languages have never been covered by any AI system before. With this release, FAIR sees Omnilingual ASR as a step toward a "universal transcription system" that could help break down global language barriers.
The model's accuracy depends on available training data. According to Meta, Omnilingual ASR delivers a character error rate below 10 for 78 percent of the 1,600 languages tested. For languages with at least ten hours of training audio, 95 percent hit this mark or better. Even for "low-resource" languages with less than ten hours of audio, 36 percent fall below the 10 character error rate threshold.
To support further research and real-world use, Meta has also released the Omnilingual ASR Corpus, a large dataset of transcribed speech in 350 underrepresented languages. This data, available under a Creative Commons (CC-BY) license, is meant to help developers and researchers build or adapt speech recognition models for specific local needs.
Scaling to new languages with in-context learning
A key feature of Omnilingual ASR is its "Bring Your Own Language" option, which uses in-context learning. Adapting a technique from large language models, users can add new languages by providing a few paired audio and text samples. The system learns directly from these examples, so there's no need for retraining or heavy computing resources.
Meta says this approach could, in theory, expand Omnilingual ASR to more than 5,400 languages - far beyond current industry standards. While recognition quality for minimally supported languages doesn't yet match fully trained systems, it brings practical speech recognition to communities that previously had no access.
Open-source release and model options
Meta is releasing Omnilingual ASR as open source under the Apache 2.0 license, so researchers and developers can freely use, modify, and build on the models, including for commercial use. The datasets are available under a CC-BY license.
The Omnilingual ASR family includes models ranging from a lightweight 300 million parameter version for low-power devices to a 7 billion parameter version for "top-tier accuracy." All models are built on FAIR's PyTorch-based fairseq2 framework, and a demo is available here.