Cursor quietly built its new coding model on top of Chinese open-source Kimi K2.5
Key Points
- With Composer 2, Cursor has released its own AI model for software development at $0.50 per million input tokens and $1.50 per million output tokens, significantly cheaper than Claude Opus 4.6 ($5.00/$25.00) and GPT-5.4 ($2.50/$15.00).
- The code-specialized model scores 61.3 on Cursor's internal CursorBench, a major improvement over its predecessor Composer 1.5 (44.2) and competitive with Claude Opus 4.6 (58.2) and GPT-5.4 Thinking (63.9).
- Building its own model is a strategic necessity for Cursor: the company competes directly with Anthropic and OpenAI while simultaneously depending on their models, leaving it with limited pricing flexibility as both providers offer low-cost flat-rate plans.
Update from March 21, 2026:
Cursor's new AI model is built on top of the Chinese open-source model Kimi K2.5. Cursor employee Lee Robinson says roughly a quarter of the pretraining comes from the base model, with Cursor doing the rest through fine-tuning and continued training. Because of that additional training, the model's benchmark results differ from the original Kimi K2.5. The commercial license runs through inference partner Fireworks.
Cursor never disclosed any of this, drawing criticism. It only came out after Kimi employees dug into the model themselves. Cursor co-founder Aman Sanger owned up to the mistake: "It was a miss to not mention the Kimi base in our blog from the start. We'll fix that for the next model."
The bigger question is why Cursor kept quiet in the first place. The most likely answer: admitting it would mean conceding that, unlike Anthropic and OpenAI, Cursor can't build its own frontier model. Both competitors pour billions into proprietary base models; Cursor simply can't play at that level.
Cursor could have pitched open-source fine-tuning as a billion-dollar shortcut
There's nothing wrong with taking a strong open-source model and fine-tuning it for a specific use case, though. It's common practice and often the smarter path, especially for a company whose real strength isn't pretraining massive language models but building a coding editor. The problem is shipping someone else's base model under your own brand without saying so.
There's another way to look at this, though. If Cursor's fine-tuned model can genuinely compete with billion-dollar proprietary efforts, that's an uncomfortable question for the frontier labs: how much is a proprietary base model actually worth when a small team with smart fine-tuning can get similar results?
Cursor would have been much better off owning the open-source angle from the start, positioning comparable results from a fine-tuned open-source model as proof that billion-dollar proprietary development isn't the only path forward. That framing would have put the pressure on OpenAI and Anthropic instead of leaving Cursor on the defensive.
Original article from March 19, 2026:
Cursor takes on OpenAI and Anthropic with Composer 2, a code-only model built to match rivals at a fraction of the cost
Cursor releases Composer 2, the second generation of its own AI model for software development. The model aims to match the leading coding models from Anthropic and OpenAI at a fraction of the cost.
The model is now available in Cursor and in the early alpha of the new "Glass" interface. Pricing starts at 0.50 dollars per million input tokens and 2.50 dollars per million output tokens. A faster variant that Cursor says delivers identical intelligence costs 1.50 and 7.50 dollars per million tokens respectively, and ships as the default.
| Model | Price per 1 million tokens, input / output | Hint |
|---|---|---|
| Composer 2 | 0.50 / 2.50 dollars | Standard version |
| Composer 2 Fast | 1.50 / 7.50 dollars | Faster version with the same intelligence according to the cursor |
| Claude Opus 4.6 | 5.00 / 25.00 dollars | API price according to Anthropic, valid for any context length |
| GPT-5.4 | 2.50 / 15.00 dollars, short context; 5.00 / 22.50 dollars, long context | OpenAI price depending on context length |
On pure API pricing, Cursor Composer 2 comes in well below both Claude Opus 4.6 and GPT-5.4. Even the faster variant still undercuts both competitors by a wide margin on token costs.
Co-founder Aman Sanger told Bloomberg the model was trained exclusively on code data. That narrow focus made it possible to build a smaller, more cost-effective model. "It won't help you do your taxes. It won't be able to write poems," Sanger said.
Reinforcement learning on long coding tasks drives quality gains
According to Cursor, the quality improvements over its predecessor come down to a stronger first pass of continued pretraining, which provides a better foundation for the reinforcement learning that follows. Training runs on so-called long-horizon coding tasks, programming challenges that require hundreds of individual actions to complete.
The numbers Cursor published show a major jump, especially compared to earlier Composer versions. On CursorBench, the company's internal benchmark for coding tasks, Composer 2 climbs from 44.2 (Composer 1.5) to 61.3. The model also posts gains on Terminal Bench 2.0, a benchmark for agent-based tasks in the terminal, and on SWE-bench Multilingual, which tests software engineering tasks across multiple programming languages.
| Model | CursorBench | Terminal Bench 2.0 | Terminal Bench 2.0 optimized | SWE-bench Multilingual |
|---|---|---|---|---|
| Composer 2 | 61.3 | 61.7 | 73.7 | |
| Composer 1.5 | 44.2 | 47.9 | 65.9 | |
| Composer 1 | 38.0 | 40.0 | 56.9 | |
| Claude Opus 4.6 | 58.2 | 58.0 | 65.4 | 77.8 |
| GPT 5.4 Thinking | 63.9 | 75.1 | N/A |
Terminal Bench 2.0 scores aren't directly comparable across the board, since results also depend on the agent, harness, and settings. For Claude Opus 4.6, 58.0 is the Claude Code value; 65.4 is an additional optimized value published by Anthropic. For GPT-5.4 Thinking, only a single published Terminal Bench value is available.
Building its own model is about survival, not just performance
Cursor competes directly with Anthropic and OpenAI, both of which are shipping increasingly powerful AI models for software development. According to Bloomberg, Cursor now has more than one million daily users and around 50,000 enterprise customers. The company is also in talks about a new funding round at a valuation of roughly 50 billion dollars.
At the same time, Cursor faces a structural dilemma. The platform still supports models from OpenAI and Anthropic, which means it's competing with the very providers whose technology it has relied on. As long as Cursor buys third-party models, its pricing, performance, and margins all depend on companies that sell directly to the same customers.
Anthropic, in particular, is making aggressive moves in the coding market with Claude Code. Cursor reportedly estimates that a single Claude Code subscription at 200 dollars a month can rack up around 5,000 dollars in actual compute costs. That highlights the structural problem: when you build on someone else's model, you're paying full price for compute that the model provider can heavily subsidize in its own product.
That doesn't leave Cursor much room. According to the report, consumer subscriptions are already running at negative margins, with enterprise contracts carrying the business. And the longer-term risk may be even bigger—as AI coding agents get more capable, users might skip the IDE entirely and work with these systems straight from the model provider.
AI News Without the Hype – Curated by Humans
As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.
Subscribe now