Anthropic's Claude 3 beats OpenAI's GPT-4. Right? In the benchmarks published by the company, the largest model, Opus, beats GPT-4, but a closer look reveals that it is complicated: Anthropic tested its latest model against the first version of GPT-4, not newer versions like GPT-4 Turbo. The reason: OpenAI has so far only published benchmarks for the old GPT-4 model, which can only be accessed via the API. However, there are GPT-4 Turbo results for some benchmarks that do not come directly from OpenAI. AI researcher Lawrence Chan has compiled them. A look at these numbers makes it clear: In every benchmark where Claude 3 and GPT-4 Turbo were compared, the OpenAI model still beats the best model from Anthropic - even if only by a few percentage points. However, the models are so close that the question of which model is better depends very much on the task at hand - and is mostly a matter of taste.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.