OpenAI announced new features for app developers at its DevDay conference. The company is now offering its advanced speech synthesis technology for integration into third-party applications.
The new "Realtime API" lets developers add six AI voices to their apps. These voices are different from those used in ChatGPT. To avoid legal issues, developers can't use third-party voices.
OpenAI showed off a travel planning app using the Realtime API. Users could talk to an AI assistant about a London trip and get quick responses. The API can also add restaurant suggestions to maps.
The technology works for phone calls too, like placing orders. OpenAI doesn't automatically disclose it's an AI voice, leaving that up to developers for now.
New GPT-4o features and cost savings
OpenAI also announced that developers can now use images to fine-tune GPT-4o. With just 100 example images, the model's performance can be improved for specific visual tasks.
A new prompt caching feature aims to reduce costs and latency. By reusing recently seen input tokens, developers can get a 50 percent discount and faster processing times.
Prompt caching is automatically applied to the latest versions of GPT-4o, GPT-4o mini, o1-preview and o1-mini, as well as fine-tuned versions of these models.
"Model distillation" allows smaller models like GPT-4o mini to be optimized using outputs from larger models. OpenAI is providing new integrated tools for this, including saved completions and evaluation options.
OpenAI is doubling the rate limit for its new o1 model. To help developers get started, the company is offering free training quotas for GPT-4o and GPT-4o mini until the end of October.