Ad
Skip to content

What OpenAI wants to learn from its failed ChatGPT update

Image description
GPT-Image-1 prompted by THE DECODER

Key Points

  • A faulty update for GPT-4o caused ChatGPT to become too agreeable, confirming users' doubts and supporting impulsive actions; OpenAI withdrew the update after three days.
  • OpenAI explained that conflicting training changes and the introduction of a new memory feature weakened earlier safeguards, a problem that internal tests did not detect.
  • In response, OpenAI plans to improve its testing process, make behavioral issues like excessive affirmation a reason to delay launches, communicate more openly, and consider the tool's use in emotional support more carefully.

A recent GPT-4o update made ChatGPT noticeably more agreeable—but with some troubling side effects.

The chatbot not only tried to placate users, but also reinforced their doubts, encouraged impulsive decisions, and sometimes even fanned the flames of anger. In one experiment, ChatGPT went so far as to applaud acute psychotic episodes.

OpenAI rolled back the update after just three days. Now the company says it has figured out what went wrong and plans to rethink how it tests new features.

Reward signals clash

According to OpenAI, several training adjustments collided to cause the problem. The system for handling user feedback (thumbs up/down) ended up weakening the main reward signal and undermined earlier safeguards against excessive agreeableness. The chatbot's new memory feature made the effect even stronger.

Ad
DEC_D_Incontent-1

Internal testing failed to catch these issues. OpenAI says that neither its usual evaluations nor its small-scale user tests flagged any warning signs. Although some experts had raised concerns about ChatGPT's communication style, there were no targeted tests for excessive friendliness.

The decision to roll out the update was ultimately based on positive test results—a move OpenAI now admits was a mistake. "We missed the mark with last week's GPT-4o update," OpenAI CEO Sam Altman wrote on X.

Behavioral issues will block future launches

In response, OpenAI plans to revamp its testing process. From now on, behavioral problems like hallucinations or excessive agreeableness will be enough to prevent an update from going live. The company is also introducing opt-in trials for interested users and stricter pre-release checks.

OpenAI says it will be more transparent about future updates and will clearly document any known limitations. One important takeaway: many people turn to ChatGPT for personal and emotional advice—a use case the company now says it will take more seriously when evaluating safety.

Ad
DEC_D_Incontent-2

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: OpenAI