Updated GPT-4 is ahead of Claude 3 Opus in the Chatbot Arena benchmark

Ideogram prompted by THE DECODER

Update from April 12, 2024:

OpenAI's recently released new GPT-4 version has once again taken the top spot in the chatbot arena. According to the publisher of the benchmark, GPT-4 shows superior performance, especially in coding and reasoning. However, Claude 3 Opus is still ahead when it comes to inputs with more than 500 tokens (Longer Query Arena).

Original article dated March 27, 2024:

Anthropic's Claude 3 replaces OpenAI's GPT-4 as most popular user-rated LLM

Anthropic's Claude 3 Opus has knocked OpenAI's GPT-4 off the top of the chatbot leaderboard for the first time.

According to the Chatbot Arena Leaderboard, Anthropic's Claude 3 Opus has taken the top spot from OpenAI's GPT-4 for the first time. Claude 3 Opus now ranks first based on how real people rate chatbot skills. GPT-4 has been pushed down to second place.

The Chatbot Arena is a benchmark platform created by the Large Model System Organization (LMSYS) to compare the performance of large language models. The Arena pits different models against each other in secret, randomized battles. Users rate the models and vote for the answer they like best. This makes the rankings very useful because they are based on what users prefer.

This is the first time in about a year since GPT-4 was released that another language model has beaten GPT-4 in the chatbot arena. What's even more impressive is that the much cheaper Anthropic Haiku model comes close to GPT-4.

Haiku is much better, especially at generating text, when you give it a lot of examples, it costs about ten times less than GPT-4 and is as good as the first GPT-4 version 0314 from March 2023, if the LMSys crowd is right. Anthropic's mid-range Sonnet also beats the original GPT-4.

Thanks to Claude 3 Opus, OpenAI's lead is slipping, which wasn't obvious in recent months, especially with Google's mostly underwhelming unveiling of Gemini. But OpenAI is still taking over the market with its models, especially with ChatGPT for regular users.

Recommendation

AI in practice

Google's innovator's dilemma is really an innovator's opportunity

However, Anthropic is likely to catch up quickly in terms of API usage. The recent changes in OpenAI's leadership have already helped Anthropic by showing companies that they don't want to rely on just one AI company.

OpenAI is likely to counter soon: leaks suggest that the company could unveil a "smarter" model as early as this summer, which could be GPT-4.5 or GPT-5. OpenAI CEO Sam Altman confirmed that his company intends to launch an "amazing" AI model this year.

Updated GPT-4 is ahead of Claude 3 Opus in the Chatbot Arena benchmark

Anthropic's Claude 3 replaces OpenAI's GPT-4 as most popular user-rated LLM

Google's innovator's dilemma is really an innovator's opportunity

Video AI avatars get better at expressing emotions

Meta feels pretty good about the launch of Llama 3

Google's Sundar Pichai touts AI leadership as revenue and investments soar

New AI training method mitigates the "lost-in-the-middle" problem that plagues LLMs

Current LLMs "undertrained by a factor of maybe 100-1000X or more" says OpenAI co-founder

Hundreds of examples in prompts can significantly boost LLM performance, study finds

Updated GPT-4 is ahead of Claude 3 Opus in the Chatbot Arena benchmark

Anthropic's Claude 3 replaces OpenAI's GPT-4 as most popular user-rated LLM

Share

Bank details