Google Deepmind has introduced a new AI model capable of operating web and mobile interfaces. The Gemini 2.5 Computer Use model is now available in preview.
Developers can access it through the Gemini API. Built on Gemini 2.5 Pro, the model is designed to help agents interact directly with graphical user interfaces.
It works in a continuous loop: the system receives a screenshot of the environment, the user's request, and a record of past actions. From this, it generates UI actions like clicking, typing, or scrolling. After each action, a new screenshot is sent back to the model, and the process repeats.
Google says the model is primarily optimized for web browsers but can also handle mobile UI control. It is not yet intended for desktop operating system-level tasks.
According to Google, the model outperforms alternatives in benchmarks like Online-Mind2Web, WebVoyager, and AndroidWorld. These results come from internal tests and evaluations by Browserbase. It reportedly reaches over 70 percent accuracy with an average latency of about 225 seconds.
Safety mechanisms against misuse
Google identifies three main risks: intentional misuse by users, unexpected model behavior, and prompt injections on the web. The company says it has built safety features directly into the model.
A per-step safety service reviews every proposed action before execution. Developers can also use system instructions to require user confirmation or block specific high-stakes actions, such as bypassing CAPTCHAs or controlling medical devices.
Google is already using the model internally for UI testing, Project Mariner, the Firebase Testing Agent, and the AI Mode in Search. Gemini 2.5 Computer Use is available through Google AI Studio and Vertex AI, with a demo environment hosted by Browserbase.