Is OpenAI testing GPT-4.5? "gpt2-chatbot" writes better code than GPT-4 and Claude

A powerful new AI model called "gpt2-chatbot" shows capabilities that appear to be at or above the level of GPT-4.

The model, called "gpt2-chatbot," appeared without much fanfare in the LMSYS Org Chatbot Arena, a website that compares AI language models. However, its performance quickly caught the attention of testers.

"I would agree with assessments that it is at least GPT-4 level," says Andrew Gao, an AI researcher at Stanford University who has been tracking the model on LMSYS since its release.

For example, gpt2-chatbot solved a problem from the prestigious International Mathematical Olympiad on the first try - a feat he described as "insanely hard."

uh.... gpt2-chatbot just solved an International Math Olympiad (IMO) problem in one-shot

the IMO is insanely hard. only the FOUR best math students in the USA get to compete

prompt + its thoughts 🧵 https://t.co/CuO0ToJmb9 pic.twitter.com/3xxWPvtmuG

— Andrew Gao (@itsandrewgao) April 29, 2024

According to Ethan Mollick, a professor at the Wharton School, the model seems to perform better than GPT-4 Turbo on complex reasoning tasks such as writing code. Chase McCoy, founding engineer at CodeGen, said that gpt2-chatbot "is definitely better at complex code manipulation tasks than Claude Opus or the latest GPT4. Did better on all the coding prompts we use to test new models."

There are more examples on Twitter: Alvaro Cintas generated a Snake game on the first attempt.

This was the game it gave me! Code it right in the first try pic.twitter.com/ihMhBR9BAo

— Alvaro Cintas (@dr_cintas) April 29, 2024

Sully Omar, co-founder of Cognosys, had the model draw a unicorn - a test from Microsoft's controversial "Sparks of AGI" paper.

Gpt2 drawing unicorns vs Claude opus

Whatever this model is, its really good. pic.twitter.com/XHDMWaFdW9

— Sully (@SullyOmarr) April 29, 2024

GPT-4.5 or something entirely different?

The strong performance and clues about the tokenizer used by OpenAI suggest that gpt2-chatbot may come from OpenAI and could be a test of GPT-4.5 or another new model from the company. LMSYS confirmed that it also allows model providers to test their models anonymously. The model also describes itself as ChatGPT and "based on GPT-4."

However, self-descriptions of AI models are not always reliable, and some testers report more hallucinations than GPT-4 Turbo. OpenAI CEO Sam Altman responded to the rumors with a post on X: "I have a soft spot for gpt2." In short, although the similarities to earlier OpenAI creations suggest a possible connection, conclusive evidence is still lacking.

Recommendation

AI in practice

Update

Google DeepMind's Gemini wins Mathematical Olympiad gold using only natural language

So it's also possible that a lesser-known group released the model to demonstrate their capabilities and attract attention.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Is OpenAI testing GPT-4.5? "gpt2-chatbot" writes better code than GPT-4 and Claude

GPT-4.5 or something entirely different?

Google DeepMind's Gemini wins Mathematical Olympiad gold using only natural language

OpenAI launches GPT-5 as a unified system with adaptive reasoning for complex tasks

An invisible prompt in a Google Doc made ChatGPT access data from a victim’s Google Drive

Microsoft unveils Project Ire, an AI system that automatically detects malware

OpenAI launches GPT-5 as a unified system with adaptive reasoning for complex tasks

Google Deepmind's Genie 3 creates interactive 3D worlds that stay consistent for "multiple minutes"

Google upgrades Gemini with Deep Think and flags early warning risks

Is OpenAI testing GPT-4.5? "gpt2-chatbot" writes better code than GPT-4 and Claude

GPT-4.5 or something entirely different?

Share

Bank details