Google launches Gemini 2.0, focusing on AI agents and multimodal capabilities

Dec 11, 2024

Google

Key Points

Google DeepMind has introduced Gemini 2.0 Flash, an AI model with native multimodal capabilities and agentic properties. It outperforms its larger predecessor, Gemini Pro 1.5 002, on most benchmarks.
It can process text, images, video, and audio, generate images, produce multilingual voices, and access tools such as Google search. Flash 2.0 is available through Google's API and the Gemini web app.
Three research prototypes demonstrate the potential of agentic AI: Project Astra, a universal AI assistant; Project Mariner, a Chrome extension for analyzing websites; and Jules, an AI agent for developers that helps troubleshoot GitHub workflows.

Google DeepMind today announced the next version of its AI model. Gemini 2.0 Flash Experimental is available now in the web chat app and to developers and select testers through the Gemini API in Google AI Studio and Vertex AI, with a broader release planned for early 2025.

The new version brings significant improvements to its multimodal capabilities, processing text, images, video, and audio while generating images and multilingual voices natively. Google plans to integrate Gemini 2.0 into its AI Overviews—infamously known for their mixed accuracy—to handle more complex topics and multi-step questions, including advanced math equations, multimodal queries, and coding challenges.

According to Google, Gemini Flash 2.0 runs twice as fast as its predecessor, Gemini 1.5 Pro. While it nearly matches Anthropic's Sonnet "3.6" in benchmarks, it may be much cheaper given Google's pricing for Flash 1.5. Keep in mind that benchmark performance often differs from real-world performance.

Comparison table: Performance metrics of Gemini versions across categories like coding, mathematics and reasoning with benchmark results. — The latest version of Gemini outperforms its predecessor in nearly every area, often outperforming the larger, more expensive flagship Gemini 1.5 Pro. Notable gains include a 92.9 percent success rate in code generation and 89.7 percent in math. | Image: Google

Google is rolling out a chat-optimized version of Gemini 2.0 Flash Experimental to all Gemini users through desktop and mobile web browsers. The company plans to add mobile app integration in the near future.

For developers, Google plans to integrate Gemini 2.0 into various platforms including Android Studio, Chrome DevTools, and Firebase. The enhanced coding support, called Gemini Code Assist, will be available in popular integrated development environments such as Visual Studio Code, IntelliJ, and PyCharm.

Three specialized AI agents

Along with Gemini 2.0, Google has introduced two new research prototypes that showcase Gemini 2.0's agentic capabilities.

Project Mariner functions as an experimental Chrome extension designed for web-based tasks. The prototype has demonstrated strong performance, achieving an 83.5 percent success rate in real-world testing scenarios. To maintain security, the agent can only operate within the active browser tab and requires explicit user confirmation for sensitive actions such as purchases.

Mariner Demo | Video: via Google Deepmind

The second agent, Jules, focuses on supporting developers through GitHub workflow integration. This agent can work asynchronously, develop multi-stage troubleshooting plans, and prepare pull requests. Currently, Jules is available only to a select group of testers.

Project Astra, which Google had previously announced, will take advantage of Flash's speed and multimodal capabilities. This universal AI assistant can maintain multilingual conversations with up to ten minutes of context memory. The system integrates with Google Search, Lens, and Maps to provide comprehensive assistance.

Google is also upgrading its existing data science agent for Google Colab to use Gemini 2.0. The agent can automatically generate analyses based on user descriptions. In a recent project at the Lawrence Berkeley National Laboratory, Google claims the system cut analysis time from a week to minutes. Developers interested in testing the agent can submit requests for access.

Gaming and robotics experiments

Additionally, Google DeepMind is testing Gemini 2.0 in video games, where agents provide real-time strategic advice to players by analyzing screen content. The speed of the Flash model makes these real-time applications possible. The company also plans to test the model's enhanced spatial reasoning capabilities in robotics applications.

Google launches "Deep Research" for Gemini Advanced

Google has also introduced Deep Research for Gemini Advanced subscribers. This new agent-based feature automates complex searches and quickly generates comprehensive reports.

The company says the system is designed to mimic human research methods: searching, analyzing information, and initiating new queries based on findings. Results appear in structured reports with sources that can be exported to Google Docs. The feature combines Google's search technology with Gemini's analysis capabilities and uses a large context window of 1 million tokens.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: Google