OpenAI’s real-time API picks up laughter, accents, and switches languages in real time

OpenAI has launched its "realtime API" for production, moving it out of beta.

The API targets companies and developers building voice assistants for real-world applications like customer support, education, or personal productivity. Its main component, the "gpt-realtime" model, generates and processes speech directly, skipping the usual text conversion. It responds faster, sounds more natural, and handles complex instructions better than previous versions, according to OpenAI.

The company says gpt-realtime can now pick up on nonverbal cues like laughter, switch languages mid-sentence, and adjust its tone - for example, speaking "friendly with a French accent" or "fast and professional." The model also features two new voices, Cedar and Marin, along with improvements to the existing voices.

Video: OpenAI

On benchmarks, gpt-realtime reaches 82.8 percent accuracy on Big Bench Audio (up from 65.6 percent), 30.5 percent on MultiChallenge (up from 20.6 percent), and 66.5 percent on ComplexFuncBench (up from 49.7 percent).

Better tool integration and image input

The API now streamlines tool integration. OpenAI says the model is better at picking the right tool, triggering it at the right moment, and using the right arguments, making function calls more dependable. Developers can connect external tools and services through SIP and remote MCP servers. Reusable prompts allow for saving configurations and tool settings for different use cases.

Image input is now supported. Users can send screenshots or photos in a conversation, and the model can reference them—for example, to read text from an image or answer questions about what's shown. Developers control what the model can see.

New options let developers set token limits and trim multi-turn conversations, which helps control costs for longer sessions. Pricing for gpt-realtime is now 20 percent lower: $32 per million audio input tokens and $64 per million output tokens. Cached input tokens are $0.40 per million.

OpenAI says the API can detect problematic content and end conversations that break its policies, but the history of language model security suggests this shouldn't be the only safeguard. Developers can add their own safety requirements. For EU users, there are options for storing data within the EU and special privacy rules for businesses.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Recommendation

AI in practice

OpenAI’s real-time API picks up laughter, accents, and switches languages in real time

Better tool integration and image input

OpenAI launches o1 and ChatGPT Pro for $200 per month

As Google pulls ahead, OpenAI's comeback plan is codenamed 'Shallotpeat'

OpenAI report suggests GPT‑5 is starting to ease scientists’ daily workloads

OpenAI launches "ChatGPT for Teachers"

Gemini 3 Pro tops new AI reliability benchmark, but hallucination rates remain high

Researchers push "Context Engineering 2.0" as the road to lifelong AI memory

German court deepens the split on AI and copyright with its latest ruling

OpenAI’s real-time API picks up laughter, accents, and switches languages in real time

Better tool integration and image input

Share

Bank details