A powerful new AI model called "gpt2-chatbot" shows capabilities that appear to be at or above the level of GPT-4.
The model, called "gpt2-chatbot," appeared without much fanfare in the LMSYS Org Chatbot Arena, a website that compares AI language models. However, its performance quickly caught the attention of testers.
"I would agree with assessments that it is at least GPT-4 level," says Andrew Gao, an AI researcher at Stanford University who has been tracking the model on LMSYS since its release.
For example, gpt2-chatbot solved a problem from the prestigious International Mathematical Olympiad on the first try - a feat he described as "insanely hard."
uh.... gpt2-chatbot just solved an International Math Olympiad (IMO) problem in one-shot
the IMO is insanely hard. only the FOUR best math students in the USA get to compete
prompt + its thoughts 🧵 https://t.co/CuO0ToJmb9 pic.twitter.com/3xxWPvtmuG
— Andrew Gao (@itsandrewgao) April 29, 2024
According to Ethan Mollick, a professor at the Wharton School, the model seems to perform better than GPT-4 Turbo on complex reasoning tasks such as writing code. Chase McCoy, founding engineer at CodeGen, said that gpt2-chatbot "is definitely better at complex code manipulation tasks than Claude Opus or the latest GPT4. Did better on all the coding prompts we use to test new models."
There are more examples on Twitter: Alvaro Cintas generated a Snake game on the first attempt.
This was the game it gave me! Code it right in the first try pic.twitter.com/ihMhBR9BAo
— Alvaro Cintas (@dr_cintas) April 29, 2024
Sully Omar, co-founder of Cognosys, had the model draw a unicorn - a test from Microsoft's controversial "Sparks of AGI" paper.
Gpt2 drawing unicorns vs Claude opus
Whatever this model is, its really good. pic.twitter.com/XHDMWaFdW9
— Sully (@SullyOmarr) April 29, 2024
GPT-4.5 or something entirely different?
The strong performance and clues about the tokenizer used by OpenAI suggest that gpt2-chatbot may come from OpenAI and could be a test of GPT-4.5 or another new model from the company. LMSYS confirmed that it also allows model providers to test their models anonymously. The model also describes itself as ChatGPT and "based on GPT-4."
However, self-descriptions of AI models are not always reliable, and some testers report more hallucinations than GPT-4 Turbo. OpenAI CEO Sam Altman responded to the rumors with a post on X: "I have a soft spot for gpt2." In short, although the similarities to earlier OpenAI creations suggest a possible connection, conclusive evidence is still lacking.
So it's also possible that a lesser-known group released the model to demonstrate their capabilities and attract attention.