Anthropics Claude 3 lags behind GPT-4 Turbo

Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.

Profile

E-Mail

Anthropic's Claude 3 beats OpenAI's GPT-4. Right? In the benchmarks published by the company, the largest model, Opus, beats GPT-4, but a closer look reveals that it is complicated: Anthropic tested its latest model against the first version of GPT-4, not newer versions like GPT-4 Turbo. The reason: OpenAI has so far only published benchmarks for the old GPT-4 model, which can only be accessed via the API. However, there are GPT-4 Turbo results for some benchmarks that do not come directly from OpenAI. AI researcher Lawrence Chan has compiled them. A look at these numbers makes it clear: In every benchmark where Claude 3 and GPT-4 Turbo were compared, the OpenAI model still beats the best model from Anthropic - even if only by a few percentage points. However, the models are so close that the question of which model is better depends very much on the task at hand - and is mostly a matter of taste.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:

Bank transfer

Sources

LessWrong (LawrenceC)

Maximilian Schreiner

Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.

Profile

E-Mail