Artificial Analysis crowns winners in most comprehensive AI chatbot comparison to date

Sep 12, 2024

A new analysis by Artificial Analysis offers the most comprehensive comparison to date of leading AI chatbots, including ChatGPT, Claude, Bing Chat, and Poe. ChatGPT wins half of the six categories, while Claude takes two.

The comparison evaluated chatbots based on model intelligence, features, speed, and context window. Analysts used the model with the highest "Quality Index" for each chatbot, an average of various benchmarks. For example, ChatGPT uses OpenAI's latest GPT-4o model in both free and paid plans, while Claude Pro uses Claude 3.5 Sonnet, with access to Claude 3 Haiku and Opus models.

Category	chatbot	Award
Best chatbot overall	ChatGPT Plus	20 USD/month
Best free chatbot	ChatGPT	Free of charge
Best coding chatbot	Claude Pro	20 USD/month
Best image chatbot	Poe	20 USD/month
Best data processing	ChatGPT Plus	20 USD/month
Best context window	Claude Pro	20 USD/month

It's important to differentiate between the chatbot and the underlying language model. Sometimes, but not always, both come from the same manufacturer. Poe, whose provider Quora doesn't have its own language model at this level, uses GPT-4o in the test, just like ChatGPT.

However, thanks to the programming interface OpenAI makes available to external developers, the language model can be more capable than ChatGPT. To save resources, OpenAI deliberately limits its own chatbot's capabilities. This becomes clear in the context window, among other things, as seen below

Best chatbot overall: ChatGPT Plus

In terms of general intelligence and reasoning ability, Anthropic's Claude Pro and Claude Free are just ahead of OpenAI's ChatGPT Plus and Free according to Artificial Analysis' Quality Index, which summarizes results from benchmarks such as MMLU, GPQA, Math and HumanEval.

However, ChatGPT Plus earned the "Best Chatbot Overall" title due to its strong combination of model intelligence and extensive features, if you're willing to pay for it.

Best free chatbot: ChatGPT

ChatGPT Free was named "Best Free" chatbot because it offers limited access to OpenAI's advanced GPT-4o model with various features. Within the approximately 6 messages per hour that OpenAI grants users with GPT-4o, the free ChatGPT has full access to ChatGPT Plus's extensive feature set, making it the best free AI chatbot experience.

Best coding chatbot: Claude Pro

Anthropic's Claude Pro received two awards - "Best Coding" with its high scores in coding benchmarks and long context for working with large codebases, and "Best Long Context" with a 200,000 token context window, the largest of all chatbots tested. Claude's support for the Claude 3.5 Sonnet base model and flexible file upload capabilities make it ideal for long-context reasoning and large file processing.

Best image chatbot: Poe

Poe, Quora's chatbot app, won "Best Image Processing" due to its integration of leading image generation models such as Flux-1, Ideogram v2 and Playground v3 Beta. Poe Pro supports various third-party language and image models, but not Midjourney, which remains one of the most flexible and high-quality image models.

Best data processing: ChatGPT Plus

ChatGPT Plus secured the "Best Data Processing" title as it combines GPT-4o's intelligence with a Python code interpreter to excel at data analysis tasks. Users can upload data files such as Excel and CSV directly into the code interpreter, and the model capably writes code to analyze the data and create charts.

Best context window: Claude Pro

For the longest usable context window, Claude Pro leads with 200,000 tokens. Poe Pro and Mistral Le Chat also impress with 180,000 and 40,000 tokens respectively. Most other chatbots range from 5,000 to 20,000 tokens.

The comparison showed that the effective context window available to many chatbot applications is significantly smaller than the full context window of the underlying basic model. Longer context windows allow users to enter more data into the chatbot, e.g., by uploading longer documents.

In the past, however, language models with large context windows often didn't fully utilize them because information from the prompt could get lost.

Gemini is by far the fastest LLM

Speed tests show that Gemini in the free tariff (Gemini 1.5 Flash) and Claude are the fastest with 150 and 70 tokens/s respectively. ChatGPT and Bing perform solidly in the midfield with around 50 tokens/s, while newer entrants such as Grok only manage 10-20 tokens/s.

Comparing overall intelligence with the range of functions, ChatGPT takes first place in both paid and free versions. For the combination of model intelligence and context window, Claude Pro and Poe Pro significantly outperform the competition.

When comparing overall intelligence to feature set, ChatGPT comes out on top in both the paid and free versions. When it comes to the combination of model intelligence and context window, Claude Pro and Poe Pro leave the competition far behind.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

More than 16% discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder