Google Deepmind has introduced Gemini 2.5 Pro, which the company describes as its most capable AI model to date.
According to Google, the new model already leads numerous benchmark tests by significant margins, including the Chatbot Arena that measures human preferences.
The model represents Google's first major reasoning model following initial experiments with Flash 2.0 Thinking. Google intends to integrate these reasoning capabilities directly into all its future models.
Performance across multiple domains
Gemini 2.5 Pro demonstrates strong capabilities in various areas, Google says. Without specialized optimization, the model achieves solid results in mathematical and scientific tests like GPQA and AIME. It scores 18.8% on the challenging "Humanity's Last Exam" - the highest score among models without additional tools.
For programming tasks, Google claims Gemini 2.5 Pro excels particularly in web app development and code transformation. With a customized agent configuration, it achieves 63.8% on SWE-Bench Verified. Google demonstrates this capability by showing how the model can generate functional game code from a single-line instruction. However, Anthropic's Claude 3.7 Sonnet Thinking still outperforms Google's model in this benchmark.
First true multimodal reasoning model
Like its predecessors, Gemini 2.5 Pro processes text, audio, images, video, and code - a diversity of inputs not yet matched by competing models. The model maintains Google's characteristic large context window of 1 million tokens, with plans to expand this to 2 million.
Developers and businesses can already experiment with Gemini 2.5 Pro in Google AI Studio. Gemini Advanced subscribers can select the model from the dropdown menu on both desktop and mobile devices. Google plans to announce availability in Vertex AI and pricing details in the coming weeks.