A recent GPT-4o update made ChatGPT noticeably more agreeable—but with some troubling side effects.
The chatbot not only tried to placate users, but also reinforced their doubts, encouraged impulsive decisions, and sometimes even fanned the flames of anger. In one experiment, ChatGPT went so far as to applaud acute psychotic episodes.
OpenAI rolled back the update after just three days. Now the company says it has figured out what went wrong and plans to rethink how it tests new features.
Reward signals clash
According to OpenAI, several training adjustments collided to cause the problem. The system for handling user feedback (thumbs up/down) ended up weakening the main reward signal and undermined earlier safeguards against excessive agreeableness. The chatbot's new memory feature made the effect even stronger.
Internal testing failed to catch these issues. OpenAI says that neither its usual evaluations nor its small-scale user tests flagged any warning signs. Although some experts had raised concerns about ChatGPT's communication style, there were no targeted tests for excessive friendliness.
The decision to roll out the update was ultimately based on positive test results—a move OpenAI now admits was a mistake. "We missed the mark with last week's GPT-4o update," OpenAI CEO Sam Altman wrote on X.
Behavioral issues will block future launches
In response, OpenAI plans to revamp its testing process. From now on, behavioral problems like hallucinations or excessive agreeableness will be enough to prevent an update from going live. The company is also introducing opt-in trials for interested users and stricter pre-release checks.
OpenAI says it will be more transparent about future updates and will clearly document any known limitations. One important takeaway: many people turn to ChatGPT for personal and emotional advice—a use case the company now says it will take more seriously when evaluating safety.