Content
summary Summary

Google Deepmind has introduced a new AI model capable of operating web and mobile interfaces. The Gemini 2.5 Computer Use model is now available in preview.

Ad

Developers can access it through the Gemini API. Built on Gemini 2.5 Pro, the model is designed to help agents interact directly with graphical user interfaces.

It works in a continuous loop: the system receives a screenshot of the environment, the user's request, and a record of past actions. From this, it generates UI actions like clicking, typing, or scrolling. After each action, a new screenshot is sent back to the model, and the process repeats.

 

Ad
Ad

Google says the model is primarily optimized for web browsers but can also handle mobile UI control. It is not yet intended for desktop operating system-level tasks.

According to Google, the model outperforms alternatives in benchmarks like Online-Mind2Web, WebVoyager, and AndroidWorld. These results come from internal tests and evaluations by Browserbase. It reportedly reaches over 70 percent accuracy with an average latency of about 225 seconds.

Safety mechanisms against misuse

Google identifies three main risks: intentional misuse by users, unexpected model behavior, and prompt injections on the web. The company says it has built safety features directly into the model.

A per-step safety service reviews every proposed action before execution. Developers can also use system instructions to require user confirmation or block specific high-stakes actions, such as bypassing CAPTCHAs or controlling medical devices.

Google is already using the model internally for UI testing, Project Mariner, the Firebase Testing Agent, and the AI Mode in Search. Gemini 2.5 Computer Use is available through Google AI Studio and Vertex AI, with a demo environment hosted by Browserbase.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Google DeepMind has launched the Gemini 2.5 Computer Use model in preview, allowing AI agents to directly control web and mobile interfaces by analyzing screenshots, user requests, and previous actions to generate clicks, typing, or scrolling in a continuous feedback loop.
  • The model is primarily optimized for web browsers, with some support for mobile apps, and is not designed for desktop operating system-level tasks; developers can access it via the Gemini API, Google AI Studio, and Vertex AI, with a public demo available through Browserbase.
  • In internal and third-party benchmarks, Gemini 2.5 Computer Use achieves over 70 percent accuracy and around 225 seconds average latency, while Google has built in safety measures such as per-step action review and developer controls to prevent misuse, including restrictions on actions like bypassing CAPTCHAs or controlling sensitive devices.
Sources
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.