ElevenLabs has introduced Flash, a new speech synthesis model designed for ultra-fast performance.
ElevenLabs has unveiled Flash, its newest text-to-speech model built specifically for speed. The system can transform text into speech in just 75 milliseconds - not including network and application delays - putting it among the fastest AI voice models currently available.
The company designed Flash with real-time applications in mind, particularly for conversational AI agents where quick response times are essential.
While the model prioritizes speed, ElevenLabs acknowledges some trade-offs: Flash's voices aren't quite as expressive as those generated by the slower Turbo models. However, ElevenLabs believes that most users won't notice the difference in real-time applications. The company's blind tests suggest that Flash outperforms other ultra-low-latency models on the market.
Two versions, multiple languages
Flash comes in two variants: v2 and v2.5. The base version (v2) works exclusively with English content, while v2.5 supports 32 different languages. Users can access either version through ElevenLabs' Conversational AI platform or directly via API using the identifiers "eleven_flash_v2" and "eleven_flash_v2_5."
Both versions share the same pricing structure, charging one credit for every two characters of text processed. Users can get started with Flash through either the company's Conversational AI platform or by integrating it directly through the API.