Content
summary Summary

Google's experimental AI model Gemini 2.0 Flash Thinking has jumped ahead of its competitors, scoring impressive results in math, science, and general performance tests.

Ad

According to testing platform lmarena.ai, the latest version of Gemini has made significant gains in the Chatbot Arena, improving its score by 17 points since December 2024. This puts it ahead of competitors like OpenAI's GPT-4o models and Anthropic's Claude 3.5 Sonnet.

The model shows improvements across nearly all categories, taking the lead in complex tasks, programming, and creative writing. The only area where it still needs work is style control - how it formats its outputs.

Under the hood, Google says they've added new features like code execution and expanded the model's context window to handle up to one million tokens. They've also improved how well the model's thinking process lines up with its final responses.

Ad
Ad

Google relies on years of experience with planning systems

Google DeepMind's CEO Demis Hassabis says this progress builds on more than ten years of work with AI planning systems, going all the way back to AlphaGo. By combining these tried-and-true planning methods with modern foundation models, they've seen particularly strong results in math and science testing.

This update follows the first version of Flash 2.0 Thinking, which Google launched in December 2024. That version introduced explicit thought processes that help the model improve its reasoning, and it also performed well in testing.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Google DeepMind has released a new version of its Gemini 2.0 Flash Thinking AI model, which has taken the top spot in the chatbot arena, according to ratings platform lmarena.ai, with a 17-point improvement over the last checkpoint in December 2024.
  • Technical improvements include code execution, an expanded context window of one million tokens, and improved consistency between thinking processes and responses. In particular, the significantly increased context window should open up new application possibilities.
  • Demis Hassabis, CEO of Google DeepMind, emphasizes that this development builds on more than a decade of experience in AI planning systems. The combination of these proven approaches with modern foundation models is proving to be particularly powerful, especially in math and science benchmarks.
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.