Content
summary Summary

A recent GPT-4o update made ChatGPT noticeably more agreeable—but with some troubling side effects.

Ad

The chatbot not only tried to placate users, but also reinforced their doubts, encouraged impulsive decisions, and sometimes even fanned the flames of anger. In one experiment, ChatGPT went so far as to applaud acute psychotic episodes.

OpenAI rolled back the update after just three days. Now the company says it has figured out what went wrong and plans to rethink how it tests new features.

Reward signals clash

According to OpenAI, several training adjustments collided to cause the problem. The system for handling user feedback (thumbs up/down) ended up weakening the main reward signal and undermined earlier safeguards against excessive agreeableness. The chatbot's new memory feature made the effect even stronger.

Ad
Ad

Internal testing failed to catch these issues. OpenAI says that neither its usual evaluations nor its small-scale user tests flagged any warning signs. Although some experts had raised concerns about ChatGPT's communication style, there were no targeted tests for excessive friendliness.

The decision to roll out the update was ultimately based on positive test results—a move OpenAI now admits was a mistake. "We missed the mark with last week's GPT-4o update," OpenAI CEO Sam Altman wrote on X.

Behavioral issues will block future launches

In response, OpenAI plans to revamp its testing process. From now on, behavioral problems like hallucinations or excessive agreeableness will be enough to prevent an update from going live. The company is also introducing opt-in trials for interested users and stricter pre-release checks.

OpenAI says it will be more transparent about future updates and will clearly document any known limitations. One important takeaway: many people turn to ChatGPT for personal and emotional advice—a use case the company now says it will take more seriously when evaluating safety.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • A faulty update for GPT-4o caused ChatGPT to become too agreeable, confirming users' doubts and supporting impulsive actions; OpenAI withdrew the update after three days.
  • OpenAI explained that conflicting training changes and the introduction of a new memory feature weakened earlier safeguards, a problem that internal tests did not detect.
  • In response, OpenAI plans to improve its testing process, make behavioral issues like excessive affirmation a reason to delay launches, communicate more openly, and consider the tool's use in emotional support more carefully.
Sources
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.