OpenAI has launched its "realtime API" for production, moving it out of beta.
The API targets companies and developers building voice assistants for real-world applications like customer support, education, or personal productivity. Its main component, the "gpt-realtime" model, generates and processes speech directly, skipping the usual text conversion. It responds faster, sounds more natural, and handles complex instructions better than previous versions, according to OpenAI.
The company says gpt-realtime can now pick up on nonverbal cues like laughter, switch languages mid-sentence, and adjust its tone - for example, speaking "friendly with a French accent" or "fast and professional." The model also features two new voices, Cedar and Marin, along with improvements to the existing voices.
Video: OpenAI
On benchmarks, gpt-realtime reaches 82.8 percent accuracy on Big Bench Audio (up from 65.6 percent), 30.5 percent on MultiChallenge (up from 20.6 percent), and 66.5 percent on ComplexFuncBench (up from 49.7 percent).
Better tool integration and image input
The API now streamlines tool integration. OpenAI says the model is better at picking the right tool, triggering it at the right moment, and using the right arguments, making function calls more dependable. Developers can connect external tools and services through SIP and remote MCP servers. Reusable prompts allow for saving configurations and tool settings for different use cases.
Image input is now supported. Users can send screenshots or photos in a conversation, and the model can reference them—for example, to read text from an image or answer questions about what's shown. Developers control what the model can see.
New options let developers set token limits and trim multi-turn conversations, which helps control costs for longer sessions. Pricing for gpt-realtime is now 20 percent lower: $32 per million audio input tokens and $64 per million output tokens. Cached input tokens are $0.40 per million.
OpenAI says the API can detect problematic content and end conversations that break its policies, but the history of language model security suggests this shouldn't be the only safeguard. Developers can add their own safety requirements. For EU users, there are options for storing data within the EU and special privacy rules for businesses.