Ad
Skip to content

Cursor's agent swarm tackles one of software's hardest problems and delivers a working browser

Image description
Nano Banana Pro prompted by THE DECODER

Key Points

  • Cursor deployed hundreds of autonomous AI agents to build a functional web browser with its own rendering engine in nearly a week.
  • Initial attempts with flat hierarchies failed as agents became risk-averse and bottlenecked, but success came through clear role separation: Planners create tasks, Workers execute them, and a Judge Agent determines completion.
  • Cursor found that prompt design matters more than infrastructure and that GPT-5.2 outperforms coding-specific models for planning, with ongoing projects including a Windows 7 emulator with 1.2 million lines of code and an Excel clone with 1.6 million lines.

Building a web browser from scratch is considered one of the most complex software projects imaginable. All the more remarkable: Cursor set hundreds of autonomously working AI agents to exactly this task and after nearly a week produced a working browser with its own rendering engine.

"I have to admit I'm very surprised to see something this capable emerge so quickly," writes Simon Willison, British programmer and co-creator of the Django web framework. Willison is one of the most popular independent bloggers on Hacker News and coined the term "Prompt Injection" in 2022 for a critical security vulnerability in LLMs (after Jonathan Cefalu had previously reported the problem to OpenAI as "command injection"). His assessments of AI-assisted software development are closely followed in the industry.

Just earlier in January, Willison had predicted that an AI-assisted web browser wouldn't be realistic until 2029 at the earliest. Now he's correcting himself: "I may have been off by three years." The browser renders web pages recognizably correct, albeit with visible glitches that show no existing engine is being used. But this is roughly the quality of result he had in mind for his 2029 prediction.

Flat hierarchies failed

The path to a working system wasn't straightforward. Cursor's first approach, agents with equal status coordinating through a shared file, failed spectacularly. When an agent wanted to take on a task, it first had to "lock" it so no other agent would start the same work. But agents held these locks too long or forgot to release them entirely. "Twenty agents would slow down to the effective throughput of two or three, with most time spent waiting."

Ad
DEC_D_Incontent-1

Without a clear hierarchy, the agents also exhibited surprising behavior. They became risk-averse. "They avoided difficult tasks and made small, safe changes instead. No agent took responsibility for hard problems or end-to-end implementation." Work churned for long periods without real progress.

Planners, workers, and a judge

The solution was clear role separation. Planners continuously explore the codebase and create tasks. They can spawn sub-planners for specific areas, for instance a sub-planner just for CSS rendering or one for the JavaScript engine. This makes planning itself parallel and recursive.

Workers, on the other hand, don't worry about the big picture. They pick up a task, complete it, push their changes, done. At the end of each cycle, a Judge Agent determines whether the project is complete or another iteration should begin.

Prompts still matter

"Many of our improvements came from removing complexity rather than adding it," writes Wilson Lin from Cursor. A dedicated integrator role for quality control and conflict resolution, for example, "created more bottlenecks than it solved." The workers could handle conflicts themselves.

Ad
DEC_D_Incontent-2

Model choice proved crucial for long autonomous work. GPT-5.2 was found to be significantly better at "following instructions, keeping focus, avoiding drift, and implementing things precisely and completely." Opus 4.5, by contrast, "tends to stop earlier and take shortcuts when convenient," yielding back control quickly rather than completing a task fully.

Different models for different roles worked best. GPT-5.2 proved "a better planner than GPT-5.1-Codex, even though the latter is trained specifically for coding." Cursor now uses the best-suited model for each role.

Another insight: "A surprising amount of the system's behavior comes down to how we prompt the agents. The harness and models matter, but the prompts matter more."

More projects running

The browser isn't the only experiment. Cursor also had agents perform a Solid-to-React migration in their own codebase, a massive frontend framework overhaul. It took over three weeks and encompassed +266,000/-193,000 lines of code. The result was already passing CI tests but still needs human review.

Another agent made video rendering 25x faster through an efficient Rust implementation. This code will be in production soon.

More projects are still running, including a Java Language Server Protocol (7,400 commits, 550,000 lines of code), a Windows 7 emulator (14,600 commits, 1.2 million lines), and an Excel clone (12,000 commits, 1.6 million lines).

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: Cursor