OpenAI has updated ChatGPT's voice features for subscribers, aiming to make the AI's speech more natural and expressive.
According to OpenAI, the revamped "Advanced Voice Mode" now delivers smoother, more emotionally nuanced speech, with improvements to intonation, pauses, and the ability to convey empathy or sarcasm in a more lifelike way.
The update also adds real-time translation capabilities. Users can now ask ChatGPT to translate between specific language pairs. The AI will then continuously interpret both sides of a conversation until instructed otherwise. OpenAI suggests this could be useful for scenarios like restaurant orders or multilingual workplace discussions.
Paying users can access these voice improvements across all platforms by clicking the language icon in the chat interface.
Known limitations
Some issues remain. OpenAI notes that users may still encounter occasional drops in audio quality, such as unexpected changes in pitch or volume, which can be more noticeable with certain voices.
In addition, so-called hallucinations continue to occur: ChatGPT sometimes produces odd sounds—including snippets that resemble ads, random noises, or even background music—without being prompted.
In one recent case, a user reported that ChatGPT suddenly played an advertisement in the middle of a conversation, even though OpenAI doesn't actually serve ads.
OpenAI first introduced Advanced Voice Mode in May 2024 with a gradual rollout, and expanded availability to the EU in October 2024. The goal is to enable natural, real-time interaction with the AI, including the ability to interrupt and express emotion during conversations. If users also turn on their cameras, ChatGPT can comment live on objects or surroundings. Google offers similar features in its Gemini app.