OpenAI has just launched Operator, an AI assistant that can navigate the web on its own. The tool, currently only available to US ChatGPT Pro subscribers, represents a step toward AI assistants that can work autonomously.
Operator can "see" websites using GPT-4o's vision capabilities and interact with them based on screenshots - tapping, clicking, and scrolling through web pages without needing any special integration with the sites themselves. It's powered by a new AI model called Computer-Using Agent (CUA).
Users simply tell Operator what they want to accomplish, and it handles the rest in a separate browser window within the ChatGPT interface. According to OpenAI, it can handle all sorts of routine browser tasks, from filling out forms to ordering groceries online.
The system lets users customize their experience by adding their own custom instructions - either for specific pages or across all sites. These prompts can be saved on the home page for easy access, and users can run multiple tasks at once in different chat windows.
A new model for computing tasks
Under the hood, Operator runs on a new AI model called Computer-Using Agent (CUA), which works by processing screen content as raw data and controlling a virtual cursor and keyboard. The model combines GPT-4o's ability to process images with advanced reasoning skills developed through reinforcement learning.
The system operates in three phases: First, it captures screenshots of what it sees on screen. Then, it uses chain-of-thought reasoning to decide what to do next, taking into account both what it's currently seeing and what it's done before.
These "inner monologues" help it make fewer mistakes and be more accurat - much like OpenAI's o-models. Finally, it takes action by clicking, scrolling, or typing until it either completes the task or needs the user to step in.
Testing shows promise, but work remains
OpenAI says CUA performs well on standard benchmarks. It scores 58.1% on WebArena, which tests how well it handles simulated websites for things like online shopping and content management. In these tests, the benchmark prompts the agent to perform tasks like searching for customer data in a browser-based CRM system.
The system performs better with real websites - on the WebVoyager benchmark, which tests it on sites like Amazon and Google Maps, it achieves an 87% success rate. However, when it comes to more complex tasks in the OSWorld benchmark, like combining PDFs from emails, its success rate drops to 38.1%.
OpenAI says that the system still struggles with complicated interfaces like presentations and calendar management. They emphasize that this is just a research preview - they plan to refine the system based on user feedback while working to make it both more affordable and available to more people.
OpenAI built three layers of security into Operator. For important actions like logging in or making payments, it asks for user permission first. When dealing with sensitive sites like email or banking, users can monitor all actions in "watch mode." A dedicated monitoring model looks out for suspicious behavior and can halt tasks if needed, while a detection system protects against malicious websites that might try to manipulate the agent.
What's next
Currently, only Pro users in the US can access Operator. OpenAI plans to roll it out to Plus, Team, and Enterprise users later, and eventually build it right into ChatGPT. They're also working on an API version of the CUA model for developers.
This launch puts OpenAI in direct competition with similar tools from other companies. Anthropic launched Claude Computer Use last fall, and Google released Project Mariner last December. Both of these services are still limited to small user groups and, like Operator, don't yet work reliably across all tasks.