New Google model to autonomously control browsers and mobile apps

Google Deepmind has introduced a new AI model capable of operating web and mobile interfaces. The Gemini 2.5 Computer Use model is now available in preview.

Developers can access it through the Gemini API. Built on Gemini 2.5 Pro, the model is designed to help agents interact directly with graphical user interfaces.

It works in a continuous loop: the system receives a screenshot of the environment, the user's request, and a record of past actions. From this, it generates UI actions like clicking, typing, or scrolling. After each action, a new screenshot is sent back to the model, and the process repeats.

Google says the model is primarily optimized for web browsers but can also handle mobile UI control. It is not yet intended for desktop operating system-level tasks.

According to Google, the model outperforms alternatives in benchmarks like Online-Mind2Web, WebVoyager, and AndroidWorld. These results come from internal tests and evaluations by Browserbase. It reportedly reaches over 70 percent accuracy with an average latency of about 225 seconds.

Safety mechanisms against misuse

Google identifies three main risks: intentional misuse by users, unexpected model behavior, and prompt injections on the web. The company says it has built safety features directly into the model.

A per-step safety service reviews every proposed action before execution. Developers can also use system instructions to require user confirmation or block specific high-stakes actions, such as bypassing CAPTCHAs or controlling medical devices.

Google is already using the model internally for UI testing, Project Mariner, the Firebase Testing Agent, and the AI Mode in Search. Gemini 2.5 Computer Use is available through Google AI Studio and Vertex AI, with a demo environment hosted by Browserbase.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Recommendation

AI in practice

New Google model to autonomously control browsers and mobile apps

Safety mechanisms against misuse

Anthropic launches Claude 3.7 Sonnet hybrid AI model and Claude Code programming tool

Google expands AI-powered search mode to Europe

Elevenlabs releases open-source UI library for voice and audio applications

OpenAI has now signed $1 trillion worth of AI infrastructure contracts

OpenAI suddenly remembers that copyright law exists after a few days of wild Sora videos

OpenAI unveils Sora 2 video model with realistic physics, high-quality audio, and a new social app

Deepmind says video models for visual tasks could become what LLMs are for text tasks

New Google model to autonomously control browsers and mobile apps

Safety mechanisms against misuse

Share

Bank details