Microsoft's MAI-Transcribe-1 runs 2.5x faster than its predecessor at $0.36 per audio hour
Microsoft has introduced MAI-Transcribe-1, a speech-to-text model supporting 25 languages that achieves the lowest word error rate of any model tested on the FLEURS benchmark, beating Scribe v2, Whisper-large-V3, GPT-Transcribe, and Gemini 3.1 Flash-Lite. The model is also built to handle tough recording conditions like background noise, poor audio quality, and overlapping speech, Microsoft says.

Microsoft is rolling out MAI-Transcribe-1 across Copilot Voice and Microsoft Teams. Developers can try it as a public preview through Microsoft Foundry and the Microsoft AI Playground. The model runs 2.5 times faster than Microsoft's previous Azure Fast offering and costs $0.36 per audio hour. Combined with MAI-Voice-1 and a language model, it can also power voice agents, Microsoft says.
Cohere and Mistral recently released open-source alternatives that perform at a similar level.
AI News Without the Hype – Curated by Humans
As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.
Subscribe now