Content
summary Summary

Update from April 12, 2024:

Ad

OpenAI's recently released new GPT-4 version has once again taken the top spot in the chatbot arena. According to the publisher of the benchmark, GPT-4 shows superior performance, especially in coding and reasoning. However, Claude 3 Opus is still ahead when it comes to inputs with more than 500 tokens (Longer Query Arena).

Image: Lmsys.org via X

Original article dated March 27, 2024:

Anthropic's Claude 3 replaces OpenAI's GPT-4 as most popular user-rated LLM

Anthropic's Claude 3 Opus has knocked OpenAI's GPT-4 off the top of the chatbot leaderboard for the first time.

Ad
Ad

According to the Chatbot Arena Leaderboard, Anthropic's Claude 3 Opus has taken the top spot from OpenAI's GPT-4 for the first time. Claude 3 Opus now ranks first based on how real people rate chatbot skills. GPT-4 has been pushed down to second place.

The Chatbot Arena is a benchmark platform created by the Large Model System Organization (LMSYS) to compare the performance of large language models. The Arena pits different models against each other in secret, randomized battles. Users rate the models and vote for the answer they like best. This makes the rankings very useful because they are based on what users prefer.

This is the first time in about a year since GPT-4 was released that another language model has beaten GPT-4 in the chatbot arena. What's even more impressive is that the much cheaper Anthropic Haiku model comes close to GPT-4.

Haiku is much better, especially at generating text, when you give it a lot of examples, it costs about ten times less than GPT-4 and is as good as the first GPT-4 version 0314 from March 2023, if the LMSys crowd is right. Anthropic's mid-range Sonnet also beats the original GPT-4.

Image: LMsys (Screenshot 27/03/24)

Thanks to Claude 3 Opus, OpenAI's lead is slipping, which wasn't obvious in recent months, especially with Google's mostly underwhelming unveiling of Gemini. But OpenAI is still taking over the market with its models, especially with ChatGPT for regular users.

Recommendation

However, Anthropic is likely to catch up quickly in terms of API usage. The recent changes in OpenAI's leadership have already helped Anthropic by showing companies that they don't want to rely on just one AI company.

OpenAI is likely to counter soon: leaks suggest that the company could unveil a "smarter" model as early as this summer, which could be GPT-4.5 or GPT-5. OpenAI CEO Sam Altman confirmed that his company intends to launch an "amazing" AI model this year.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Anthropic's Claude 3 Opus AI assistant has overtaken OpenAI's GPT-4 in the Chatbot Arena leaderboard. This is the first time GPT-4 has been knocked off the throne since its launch a year ago.
  • Developed by the Large Model System Organization (LMSYS), Chatbot Arena is a benchmark platform that compares the performance of large language models based on user preferences in anonymous, randomly selected duels.
  • Update: GPT-4 is back on top after the April 9th update.
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.