Content
summary Summary

A powerful new AI model called "gpt2-chatbot" shows capabilities that appear to be at or above the level of GPT-4.

Ad

The model, called "gpt2-chatbot," appeared without much fanfare in the LMSYS Org Chatbot Arena, a website that compares AI language models. However, its performance quickly caught the attention of testers.

"I would agree with assessments that it is at least GPT-4 level," says Andrew Gao, an AI researcher at Stanford University who has been tracking the model on LMSYS since its release.

For example, gpt2-chatbot solved a problem from the prestigious International Mathematical Olympiad on the first try - a feat he described as "insanely hard."

Ad
Ad

According to Ethan Mollick, a professor at the Wharton School, the model seems to perform better than GPT-4 Turbo on complex reasoning tasks such as writing code. Chase McCoy, founding engineer at CodeGen, said that gpt2-chatbot "is definitely better at complex code manipulation tasks than Claude Opus or the latest GPT4. Did better on all the coding prompts we use to test new models."

There are more examples on Twitter: Alvaro Cintas generated a Snake game on the first attempt.

Sully Omar, co-founder of Cognosys, had the model draw a unicorn - a test from Microsoft's controversial "Sparks of AGI" paper.

GPT-4.5 or something entirely different?

The strong performance and clues about the tokenizer used by OpenAI suggest that gpt2-chatbot may come from OpenAI and could be a test of GPT-4.5 or another new model from the company. LMSYS confirmed that it also allows model providers to test their models anonymously. The model also describes itself as ChatGPT and "based on GPT-4."

However, self-descriptions of AI models are not always reliable, and some testers report more hallucinations than GPT-4 Turbo. OpenAI CEO Sam Altman responded to the rumors with a post on X: "I have a soft spot for gpt2." In short, although the similarities to earlier OpenAI creations suggest a possible connection, conclusive evidence is still lacking.

Recommendation

So it's also possible that a lesser-known group released the model to demonstrate their capabilities and attract attention.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • A new powerful AI model called "gpt2-chatbot" has appeared in LMSYS.org's chatbot arena. According to users, it shows capabilities in some areas that go beyond those of GPT-4.
  • The model solved a difficult math problem on the first try and, according to some, performs better than GPT-4 or Anthropics Claude in programming.
  • Based on similarities to previous OpenAI models, it is speculated that gpt2-chatbot could be a test for GPT-4.5 or a new OpenAI model.
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.