OpenAI releases new AI voice models with customizable speaking styles
OpenAI has released a new generation of audio models that let developers customize how their AI assistants speak. The update includes improved speech recognition and the ability to control an AI's speaking style through simple text commands.
According to OpenAI, their new gpt-4o-transcribe and gpt-4o-mini-transcribe models show lower error rates than previous Whisper systems when converting speech to text. The company says these models perform better in challenging conditions, such as heavy accents, noisy environments, and varying speech speeds.
The most notable feature comes from the new gpt-4o-mini-tts text-to-speech model. The system responds to style instructions like "speak like a pirate" or "tell this as a bedtime story," allowing developers to fine-tune how their AI voices communicate. These capabilities are built on OpenAI's GPT-4o and GPT-4o-mini architectures, which handle multiple types of media input and output.
According to OpenAI, the improved performance is due to specialized pre-training of audio datasets for more nuanced speech understanding, more efficient model distillation techniques, and expanded use of reinforcement learning in speech recognition. The company implemented "self-play" methods to simulate natural conversation patterns.
Developer access and limitations
Developers can now access these models through OpenAI's API and integrate them using the Agents SDK. For real-time applications, OpenAI suggests using their Realtime API with speech-to-speech capabilities.
For now, the system only works with OpenAI's preset artificial voices - developers can't create new voices or clone existing ones. The company says it plans to allow custom voices in the future while maintaining safety standards, and aims to expand into video for multimodal experiences.
This update follows OpenAI's March 2024 introduction of Voice Engine, which was limited to their own products and select customers. That earlier model appears to have been replaced by GPT-4o's broader multimodal capabilities.
AI News Without the Hype – Curated by Humans
As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.
Subscribe nowAI news without the hype
Curated by humans.
- Over 20 percent launch discount.
- Read without distractions – no Google ads.
- Access to comments and community discussions.
- Weekly AI newsletter.
- 6 times a year: “AI Radar” – deep dives on key AI topics.
- Up to 25 % off on KI Pro online events.
- Access to our full ten-year archive.
- Get the latest AI news from The Decoder.