Gemini 2.0 Flash Thinking: Google's smallest model takes lead in Chatbot Arena

Jan 22, 2025

Google

Google's experimental AI model Gemini 2.0 Flash Thinking has jumped ahead of its competitors, scoring impressive results in math, science, and general performance tests.

According to testing platform lmarena.ai, the latest version of Gemini has made significant gains in the Chatbot Arena, improving its score by 17 points since December 2024. This puts it ahead of competitors like OpenAI's GPT-4o models and Anthropic's Claude 3.5 Sonnet.

The model shows improvements across nearly all categories, taking the lead in complex tasks, programming, and creative writing. The only area where it still needs work is style control - how it formats its outputs.

Under the hood, Google says they've added new features like code execution and expanded the model's context window to handle up to one million tokens. They've also improved how well the model's thinking process lines up with its final responses.

Google relies on years of experience with planning systems

Google DeepMind's CEO Demis Hassabis says this progress builds on more than ten years of work with AI planning systems, going all the way back to AlphaGo. By combining these tried-and-true planning methods with modern foundation models, they've seen particularly strong results in math and science testing.

Our latest update to our Gemini 2.0 Flash Thinking model (available here: https://t.co/Rr9DvqbUdO) scores 73.3% on AIME (math) & 74.2% on GPQA Diamond (science) benchmarks. Thanks for all your feedback, this represents super fast progress from our first release just this past... pic.twitter.com/cM1gNwBoTO
Ad

- Demis Hassabis (@demishassabis) January 21, 2025

This update follows the first version of Flash 2.0 Thinking, which Google launched in December 2024. That version introduced explicit thought processes that help the model improve its reasoning, and it also performed well in testing.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

Gemini 2.0 Flash Thinking: Google's smallest model takes lead in Chatbot Arena

Google relies on years of experience with planning systems

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.