Is OpenAI testing GPT-4.5? "gpt2-chatbot" writes better code than GPT-4 and Claude

A powerful new AI model called "gpt2-chatbot" shows capabilities that appear to be at or above the level of GPT-4.

The model, called "gpt2-chatbot," appeared without much fanfare in the LMSYS Org Chatbot Arena, a website that compares AI language models. However, its performance quickly caught the attention of testers.

"I would agree with assessments that it is at least GPT-4 level," says Andrew Gao, an AI researcher at Stanford University who has been tracking the model on LMSYS since its release.

For example, gpt2-chatbot solved a problem from the prestigious International Mathematical Olympiad on the first try - a feat he described as "insanely hard."

uh.... gpt2-chatbot just solved an International Math Olympiad (IMO) problem in one-shot

the IMO is insanely hard. only the FOUR best math students in the USA get to compete

prompt + its thoughts 🧵 https://t.co/CuO0ToJmb9 pic.twitter.com/3xxWPvtmuG

— Andrew Gao (@itsandrewgao) April 29, 2024

According to Ethan Mollick, a professor at the Wharton School, the model seems to perform better than GPT-4 Turbo on complex reasoning tasks such as writing code. Chase McCoy, founding engineer at CodeGen, said that gpt2-chatbot "is definitely better at complex code manipulation tasks than Claude Opus or the latest GPT4. Did better on all the coding prompts we use to test new models."

There are more examples on Twitter: Alvaro Cintas generated a Snake game on the first attempt.

This was the game it gave me! Code it right in the first try pic.twitter.com/ihMhBR9BAo

— Alvaro Cintas (@dr_cintas) April 29, 2024

Sully Omar, co-founder of Cognosys, had the model draw a unicorn - a test from Microsoft's controversial "Sparks of AGI" paper.

Gpt2 drawing unicorns vs Claude opus

Whatever this model is, its really good. pic.twitter.com/XHDMWaFdW9

— Sully (@SullyOmarr) April 29, 2024

GPT-4.5 or something entirely different?

The strong performance and clues about the tokenizer used by OpenAI suggest that gpt2-chatbot may come from OpenAI and could be a test of GPT-4.5 or another new model from the company. LMSYS confirmed that it also allows model providers to test their models anonymously. The model also describes itself as ChatGPT and "based on GPT-4."

However, self-descriptions of AI models are not always reliable, and some testers report more hallucinations than GPT-4 Turbo. OpenAI CEO Sam Altman responded to the rumors with a post on X: "I have a soft spot for gpt2." In short, although the similarities to earlier OpenAI creations suggest a possible connection, conclusive evidence is still lacking.

Recommendation

AI in practice

Ideogram's 2.0 image generator seems to outperform Midjourney and DALL-E

So it's also possible that a lesser-known group released the model to demonstrate their capabilities and attract attention.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Is OpenAI testing GPT-4.5? "gpt2-chatbot" writes better code than GPT-4 and Claude

GPT-4.5 or something entirely different?

Ideogram's 2.0 image generator seems to outperform Midjourney and DALL-E

Trump advisors are pushing a regulation targeting what they call "woke" AI models in the tech sector

Anthropic appears to tighten the usage limits for Claude code

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Is OpenAI testing GPT-4.5? "gpt2-chatbot" writes better code than GPT-4 and Claude

GPT-4.5 or something entirely different?

Share

Bank details