ChatGPT can now handle complex tasks on its own, ranging from web searches to building presentations. The new feature pulls together earlier research efforts and gives the chatbot access to a virtual computer environment.

According to OpenAI, ChatGPT now completes tasks by proactively choosing from a toolbox of so-called agentic abilities, running them directly in its own virtual computer. This means users can ask for things like planning and shopping for a breakfast menu, analyzing competitors and creating a presentation, or organizing calendar entries based on current news.

The "ChatGPT agent" is designed to navigate websites, filter results, prompt for logins when needed, execute code, run analyses, and create editable documents like presentations or spreadsheets.

At the core of this update is what OpenAI calls a "unified agentic system." The company says it merges the strengths of earlier tools like "Operator" (for website interaction) and "Deep Research" (for synthesizing information) with ChatGPT's intelligence. Previously, these systems worked separately: Operator couldn't analyze data, and Deep Research couldn't interact with websites. Now, integrating them opens up new use cases.

A toolbox for complex workflows

The ChatGPT agent comes loaded with multiple tools: a visual browser for graphical interfaces, a text-based browser for simpler web queries, a terminal, and direct API access. The AI is supposed to pick the best tool for the job automatically. Via connectors, the agent can also access apps like Gmail or Github.

All of this happens in a virtual computer environment that keeps track of context across different tools. OpenAI emphasizes that users always stay in control. The agent asks for permission before taking any action with consequences, and users can interrupt, take over the browser, or stop tasks at any time. The agent will also proactively ask for more details if it needs them to complete a goal.

OpenAI says the underlying model powering the agent achieves new state-of-the-art results in several benchmarks. On "Humanity's Last Exam" (HLE), which tests AI on expert-level questions, the model hits a new high score of 41.6. For the tough math benchmark "FrontierMath," it clocks in at 27.4 percent accuracy.

On "DSBench," which measures performance on realistic data science tasks, OpenAI claims that the ChatGPT agent significantly outperforms humans. In "SpreadsheetBench," which tests spreadsheet handling, the agent scored 45.5%, compared to 20% for Copilot in Excel. Humans still come out on top in these tasks.

For web navigation, the "BrowseComp" benchmark shows a new state-of-the-art result of 68.9 percent, a 17.4-point jump over Deep Research.

The new agent is rolling out now for Pro, Plus, and Team users, with Enterprise and Education customers next in line over the coming weeks. Access for users in the European Economic Area and Switzerland is still in preparation. The presentation-building feature is in beta, and OpenAI says results may still look rough around the edges.

Pro users get 400 messages per month, while Plus and Team users receive 40. For the first time, additional messages will be available for purchase.

OpenAI addresses new risks and safety concerns

Letting ChatGPT take actions on the web introduces new risks, especially around user data. OpenAI says the overall risk profile is higher. The company is focusing on protecting against "prompt injection," where attackers try to manipulate the agent with hidden instructions in web pages.

OpenAI's countermeasures include training the model to spot such attacks, monitoring systems, and requiring explicit user confirmation before any high-impact action. Some critical tasks, like sending emails, require an extra "Watch Mode" for monitoring, while risky actions such as bank transfers are blocked by default.

Because of these new abilities, OpenAI classifies the agent as having "high biological and chemical capability" under its Preparedness Framework and has activated additional safeguards. According to the company, this is the most comprehensive security architecture it's ever implemented for ChatGPT. Measures include a detailed threat model, special training to prevent misuse in biological and chemical domains, continuous monitoring with classifiers and reasoning monitors, and well-defined escalation processes for suspicious activity.

During development, OpenAI worked with external biosecurity experts, safety institutes, and researchers to review and validate protections. Red-teaming by biology professionals is meant to test defenses in realistic scenarios. OpenAI says it uses a multi-layered approach to safety, involving external partners to catch new risks early. The company is also launching a bug bounty program to help spot risks in the real world.