Voxtral Transcribe 2 offers speech recognition at $0.003 per minute
Mistral AI launches Voxtral Transcribe 2, undercutting competitors on speech recognition pricing. The second-generation speech recognition models start at $0.003 per minute and, according to Mistral, outperform GPT-4o mini Transcribe, Gemini 2.5 Flash, and Deepgram Nova in accuracy. The model family comes in two variants: Voxtral Mini Transcribe V2 for processing larger audio files, and Voxtral Realtime for real-time applications with latency under 200 milliseconds. Voxtral Realtime costs twice as much and uses a proprietary streaming architecture that transcribes audio as it arrives - designed for voice assistants, live captioning, or call center analysis.
Both models support 13 languages, including German, English, and Chinese. New features include speaker recognition, word-level timestamps, and support for recordings up to three hours long. Voxtral Realtime is available as open-weights under Apache 2.0 on Hugging Face and via API, while Voxtral Mini Transcribe V2 is only accessible through Le Chat, the Mistral API, and a playground. Mistral released the first Voxtral generation in July 2025.
AI News Without the Hype – Curated by Humans
Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.
Subscribe nowRead on for the full picture.
Subscribe for hype-free coverage.
- Access to all THE DECODER articles.
- Read without distractions – no Google ads.
- Access to comments and community discussions.
- Weekly AI newsletter.
- 6 times a year: “AI Radar” – deep dives on key AI topics.
- Up to 25 % off on KI Pro online events.
- Access to our full ten-year archive.
- Get the latest AI news from The Decoder.