Users claim Claude AI is getting dumber, Anthropic says it's not

Aug 28, 2024

Anthropic / THE DECODER

Key Points

Users are reporting that Anthropic's Claude chatbot has recently become less capable, with complaints that it quickly forgets tasks and struggles with basic coding, prompting some to cancel their subscriptions.
Anthropic has denied making any changes to the Claude 3.5 Sonnet model or inference pipeline, and is now publishing its system prompts for transparency. Similar complaints surfaced in April 2024 and with ChatGPT in late 2023.
Factors such as users becoming accustomed to AI capabilities, natural variability in results, temporary resource constraints, and processing errors can contribute to perceptions of performance degradation even without significant changes to the underlying models, highlighting the challenges of maintaining and communicating AI performance.

Users are reporting that Anthropic's Claude chatbot has become less capable recently, echoing similar complaints about ChatGPT last year. Anthropic says it hasn't made any changes, highlighting the challenges of maintaining consistent AI performance.

A Reddit post claiming "Claude absolutely got dumbed down recently" gained traction, with many users agreeing the chatbot's abilities have declined. The original poster said Claude now forgets tasks quickly and struggles with basic coding, prompting them to cancel their subscription.

Something is going on in the Web UI and I'm sick of being gaslit and told that it's not. Someone from Anthropic needs to investigate this because too many people are agreeing with me in the comments.
Ad

u/NextgenAITrading on Reddit
Ad
DEC_D_Incontent-1

Anthropic's Alex Albert responded, stating their investigation "does not show any widespread issues" and confirming they haven't altered the Claude 3.5 Sonnet model or inference pipeline.

We'd also like to confirm that we've made no changes to the 3.5 Sonnet model or inference pipeline. If you notice anything specific or replicable, please use the thumbs down button on Claude responses to let us know. That feedback is very helpful.
Ad

Alex Albert, Developer Relations at Anthropic

To increase transparency, Anthropic now publishes its system prompts for Claude models on its website. Similar complaints about Claude surfaced in April 2024, which Anthropic also denied. Some users later reported performance returning to normal.

Recurring pattern of perceived AI degradation

This pattern of users perceiving AI decline followed by company denials has occurred before, notably with ChatGPT in late 2023. Complaints about GPT-4 and GPT-4 Turbo persist today, even for the latest GPT-4o model.

Several factors may explain these perceived declines. Users often become accustomed to AI capabilities and develop unrealistic expectations over time. When ChatGPT launched in November 2022 using GPT-3.5, it initially impressed many. Now, GPT-3.5 appears outdated compared to GPT-4 and similar models.

Natural variability in AI outputs, temporary computing resource constraints, and occasional processing errors also play a role. In our daily use, even reliable prompts sometimes produce subpar results, though regenerating the response usually resolves the issue.

These factors can contribute to the perception of decreased performance even when no significant changes have been made to the underlying AI models. But even when the model is changed or updated, it's often not clear what exactly is going on.

OpenAI recently released an updated GPT-4o variant, stating that users "tend to prefer" the new model because they were unable to provide a concrete changelog. The company said it would like to provide more details about how model responses differ, but cannot due to a lack of advanced research into methods for granularly evaluating and communicating improvements in model behavior.

OpenAI has previously noted that AI behavior can be unpredictable, describing AI training, model tuning, and evaluation as an "artisanal, multi-person effort" rather than a "clean industrial process. Model updates can improve some areas while degrading others. This shows how difficult it is to maintain and communicate generative AI performance at scale.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: Reddit | Reddit | Reddit