ElevenLabs and Google dominate Artificial Analysis' updated speech-to-text benchmark

Mar 1, 2026

Artificial Analysis has released version 2.0 of its AA-WER speech-to-text benchmark. ElevenLabs' Scribe v2 leads with a word error rate of just 2.3 percent, followed by Google's Gemini 3 Pro (2.9%) and Mistral's Voxtral Small (3.0%). Google's Gemini 3 Flash (3.1%) and ElevenLabs' older Scribe v1 (3.2%) are close behind. Notably, Google didn't specifically train for transcription—the strong results come from Gemini's general multimodal capabilities. OpenAI's popular open-source Whisper Large v3 (4.2%) lands mid-pack, while Alibaba's Qwen3 ASR Flash (5.9%), Amazon's Nova 2 Omni (6.0%), and Rev AI (6.1%) bring up the rear.

Bar chart showing the AA-WER v2.0 overall ranking with word error rates ranging from 2.3% (Scribe v2) to 6.1% (Rev AI). — ElevenLabs' Scribe v2 tops the AA-WER v2.0 overall ranking with the lowest word error rate, followed by Google's Gemini 3 Pro and Mistral's Voxtral Small. | Image: Artificial Analysis

The results hold up in the separate AA-AgentTalk test for speech directed at voice assistants: Scribe v2 (1.6%) and Gemini 3 Pro (1.7%) pull well ahead, with AssemblyAI's Universal-3 Pro taking third at 2.3%.

Bar chart showing the AA-AgentTalk ranking with word error rates ranging from 1.6% (Scribe v2) to 6.1% (Rev AI). — ElevenLabs' Scribe v2 and Google's Gemini 3 Pro also dominate the AA-AgentTalk voice assistant test with the lowest error rates. | Image: Artificial Analysi

AI News Without the Hype – Curated by Humans

Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.

Source: via X