GPT-4o's unexpected party trick: Mimicking your voice without permission

Midjourney prompted by THE DECODER

OpenAI has released a detailed security report for GPT-4o. Beyond enhanced capabilities, the report revealed an unexpected feature: the model can spontaneously mimic the user's voice.

The new AI language model GPT-4o can process text, audio, images, and video, as well as generate text, audio, and images. OpenAI conducted extensive security tests, including assessments by over 100 external experts. These evaluations examined potential risks such as unauthorized voice generation and the creation of false information.

According to the report, OpenAI has implemented numerous safeguards to minimize risks. For example, the model has been trained to reject certain requests, like identifying speakers by their voice. Classifiers have also been developed to block unwanted output.

GPT-4o spontaneously imitates the user's voice

Particularly curious: in rare cases, GPT-4o can briefly imitate the user's voice. During the tests, the developers observed that the model sometimes unintentionally continued a sentence in a voice that resembled that of the user.

An example of this unintentional voice generation occurred when the model shouted "No!" and then continued the sentence in a voice similar to the subject's voice.

To address this issue, OpenAI has implemented several security measures. The company only allows the model to use predefined voices created in collaboration with professional voice actors - a measure recently announced with the start of the first roll-out phase. Additionally, a classifier has been developed to recognize when the model's output deviates from approved voices. In such cases, the output is blocked.

Despite these measures, OpenAI acknowledges that unintentional voice imitation remains a vulnerability of the model. However, the company emphasizes that the risk is minimized through the use of secondary classifiers.

Only minimal risks in other areas

The report also covers various other aspects of GPT-4o. For instance, tests generally showed reduced security robustness for audio input with background noise or echoes. According to OpenAI, there are still challenges with generating misinformation and conspiracy theories in audio form.

OpenAI also assessed GPT-4o for potential catastrophic risks in cybersecurity, biological threats, persuasiveness, and model autonomy. The risk was rated as low in three out of four categories, with persuasiveness receiving a medium-high rating.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Recommendation

AI in practice

GPT-4o's unexpected party trick: Mimicking your voice without permission

GPT-4o spontaneously imitates the user's voice

Only minimal risks in other areas

OpenAI launches new reasoning model o3-mini for free ChatGPT and API

OpenAI updates ChatGPT with new feature and new GPT-4o model

ChatGPT Advanced Voice Mode now available in the EU

OpenAI's latest fine-tuning update allows GPT-4o to learn your business inside and out

Meta's human-like chatbot personas can mislead users and result in real-world harm

OpenAI launches GPT-5 as a unified system with adaptive reasoning for complex tasks

Google Deepmind's Genie 3 creates interactive 3D worlds that stay consistent for "multiple minutes"

GPT-4o's unexpected party trick: Mimicking your voice without permission

GPT-4o spontaneously imitates the user's voice

Only minimal risks in other areas

Share

Bank details