Cursor's agent swarm tackles one of software's hardest problems and delivers a working browser
Building a web browser from scratch is considered one of the most complex software projects imaginable. All the more remarkable: Cursor set hundreds of autonomously working AI agents to exactly this task and after nearly a week produced a working browser with its own rendering engine.
"I have to admit I'm very surprised to see something this capable emerge so quickly," writes Simon Willison, British programmer and co-creator of the Django web framework. Willison is one of the most popular independent bloggers on Hacker News and coined the term "Prompt Injection" in 2022 for a critical security vulnerability in LLMs (after Jonathan Cefalu had previously reported the problem to OpenAI as "command injection"). His assessments of AI-assisted software development are closely followed in the industry.
Just earlier in January, Willison had predicted that an AI-assisted web browser wouldn't be realistic until 2029 at the earliest. Now he's correcting himself: "I may have been off by three years." The browser renders web pages recognizably correct, albeit with visible glitches that show no existing engine is being used. But this is roughly the quality of result he had in mind for his 2029 prediction.
Flat hierarchies failed
The path to a working system wasn't straightforward. Cursor's first approach, agents with equal status coordinating through a shared file, failed spectacularly. When an agent wanted to take on a task, it first had to "lock" it so no other agent would start the same work. But agents held these locks too long or forgot to release them entirely. "Twenty agents would slow down to the effective throughput of two or three, with most time spent waiting."
Without a clear hierarchy, the agents also exhibited surprising behavior. They became risk-averse. "They avoided difficult tasks and made small, safe changes instead. No agent took responsibility for hard problems or end-to-end implementation." Work churned for long periods without real progress.
Planners, workers, and a judge
The solution was clear role separation. Planners continuously explore the codebase and create tasks. They can spawn sub-planners for specific areas, for instance a sub-planner just for CSS rendering or one for the JavaScript engine. This makes planning itself parallel and recursive.
Workers, on the other hand, don't worry about the big picture. They pick up a task, complete it, push their changes, done. At the end of each cycle, a Judge Agent determines whether the project is complete or another iteration should begin.
Prompts still matter
"Many of our improvements came from removing complexity rather than adding it," writes Wilson Lin from Cursor. A dedicated integrator role for quality control and conflict resolution, for example, "created more bottlenecks than it solved." The workers could handle conflicts themselves.
Model choice proved crucial for long autonomous work. GPT-5.2 was found to be significantly better at "following instructions, keeping focus, avoiding drift, and implementing things precisely and completely." Opus 4.5, by contrast, "tends to stop earlier and take shortcuts when convenient," yielding back control quickly rather than completing a task fully.
Different models for different roles worked best. GPT-5.2 proved "a better planner than GPT-5.1-Codex, even though the latter is trained specifically for coding." Cursor now uses the best-suited model for each role.
Another insight: "A surprising amount of the system's behavior comes down to how we prompt the agents. The harness and models matter, but the prompts matter more."
The browser project spans roughly one million lines of code across more than 1,000 files (available on GitHub) and took several weeks to build. "Despite the codebase size, new agents can still understand it and make meaningful progress. Hundreds of workers run concurrently, pushing to the same branch with minimal conflicts," Cursor writes.
Agents tackle major framework migrations
Alongside the browser project, the company had agents handle a Solid-to-React migration in their own codebase, a massive frontend framework overhaul. This took more than three weeks and touched +266,000/-193,000 lines of code. The result already passes CI tests but still needs thorough human review.
Another agent sped up video rendering with an efficient Rust implementation that's expected to ship soon. Several other projects remain in progress: a Java Language Server Protocol (7,400 commits, 550,000 lines of code), a Windows 7 emulator (14,600 commits, 1.2 million lines), and an Excel clone (12,000 commits, 1.6 million lines).
AI News Without the Hype – Curated by Humans
As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.
Subscribe nowAI news without the hype
Curated by humans.
- Over 20 percent launch discount.
- Read without distractions – no Google ads.
- Access to comments and community discussions.
- Weekly AI newsletter.
- 6 times a year: “AI Radar” – deep dives on key AI topics.
- Up to 25 % off on KI Pro online events.
- Access to our full ten-year archive.
- Get the latest AI news from The Decoder.