Content
summary Summary

Google is testing a new experimental mode for Gemini 2.5 Pro that adds deeper reasoning capabilities and native audio output.

Ad

The new mode, called "Deep Think," is designed to help the model evaluate multiple hypotheses before answering a prompt. According to Google, it’s based on new research methods and is currently being tested with a limited group of Gemini API users.

Google says Gemini 2.5 Pro with Deep Think beats OpenAI’s o3 model on several tasks, including the USAMO 2025 math test, the LiveCodeBench programming benchmark, and MMMU, a test for multimodal reasoning.

Gemini 2.5 Pro's new "Deep Think" mode is designed to outperform OpenAI’s o3 model on math, coding, and multimodal tasks. | Image: Google Deepmind

Google has also upgraded its 2.5 Flash model, which is optimized for speed and efficiency. The latest version performs better on reasoning, multimodal tasks, and code generation, while using 20 to 30 percent fewer tokens for the same output.

Ad
Ad
The updated Gemini 2.5 Flash model offers better performance with reduced token usage, Google says. | Image: Google Deepmind

Gemini 2.5 Flash is now available in Google AI Studio, Vertex AI, and the Gemini app. General availability for production use is expected in early June.

New audio features and computer control

Both Gemini 2.5 Pro and Flash now support native text-to-speech with multiple speaker profiles. The voice output can capture subtle effects like whispers and emotional tone, and supports more than 24 languages. Developers can control accent, tone, and speaking style through the Live API.

Two new features—"Affective Dialogue" and "Proactive Audio"—aim to make voice interactions feel more natural. Affective Dialogue allows the model to detect emotion in a user's voice and respond accordingly—whether that’s neutrally, empathetically, or in a cheerful tone.

Proactive Audio helps filter out background conversations, so the AI only responds when it's directly addressed. The goal is to reduce accidental interactions and make voice control more reliable.

Google is also bringing features from Project Mariner into the Gemini API and Vertex AI, allowing the model to control computer applications like a web browser.

Recommendation

For developers, Gemini now includes "thought summaries", a structured view of the model’s internal reasoning and the actions it takes. To manage performance, developers can configure "thinking budgets" to limit or disable the number of tokens the model uses for reasoning.

The Gemini API also now supports Anthropic's Model Context Protocol (MCP), which could make it easier to integrate with open-source tools. Google is exploring hosted MCP servers to support agent-based application development.

Google's open-source model Gemma goes multimodal

Google has added a new entry to its AI lineup with Gemma 3n, a lightweight, open-source model built specifically for mobile devices like smartphones, tablets, and laptops. The model is based on a new architecture developed in collaboration with hardware partners including Qualcomm, MediaTek, and Samsung.

Gemma 3n is designed to deliver powerful multimodal capabilities while keeping resource usage low. The 5B and 8B parameter versions require just 2 to 3 GB of RAM, making them suitable for on-device applications.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

The model supports text, audio, and image processing, and can handle tasks like transcription, translation, and processing of mixed inputs across different modalities. A key feature is "Mix-n-Match," which allows developers to extract smaller sub-models from the main architecture depending on the use case.

Google has also improved Gemma 3n’s multilingual capabilities, particularly for languages like German, Japanese, Korean, Spanish, and French. A preview version of the model is now available through Google AI Studio and the AI Edge Toolkit for local development.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Google is expanding its Gemini 2.5 Pro AI model to include the "Deep Think" experimental mode, which allows several hypotheses to be tested before an answer is given.
  • According to Google, the model outperforms OpenAI's o3 in benchmarks for mathematical, programming-related and multimodal tasks.
  • The faster and cheaper Gemini 2.5 Flash model has also been improved and now achieves better results in reasoning, multimodal tasks, code generation and long contexts. In addition, both models now support native audio output.
Sources
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.