Moonshot AI releases Kimi K2.5, claims most powerful open-weight model with 100-agent coordination
Key Points
- Chinese company Moonshot AI has released Kimi K2.5, an open-weight model that automatically distributes complex tasks to up to 100 sub-agents working in parallel, cutting execution time by up to 4.5x according to the company.
- For training, Moonshot AI developed a method called "Parallel-Agent Reinforcement Learning," where an orchestrator learns to divide tasks among specialized agents like "AI researchers" or "fact-checkers."
- In benchmarks, K2.5 outperforms GPT-5.2 and Gemini 3 Pro on agentic tasks but trails Claude 4.5 Opus and GPT-5.2 on software engineering tests.
Moonshot AI has released Kimi K2.5, which the company says is the most powerful open-weight model available. The model can independently coordinate up to 100 AI agents working in parallel on complex tasks.
Moonshot AI has unveiled Kimi K2.5, a multimodal language model that builds on Kimi K2, which launched in July.
The big new feature is "Agent Swarm" - a system where the model independently coordinates up to 100 sub-agents working in parallel on a single task. According to Moonshot AI, these agents can execute up to 1,500 tool calls and cut execution time by up to 4.5x compared to a single agent.
The model was further trained on roughly 15 trillion tokens and is supposed to be the "most powerful open-source model" available. This should be especially noticeable when creating visually appealing frontend designs.
K2.5 uses a Mixture-of-Experts architecture with one trillion total parameters, with 32 billion active per token. The model has 384 experts, with eight selected per token. It uses MoonViT with 400 million parameters as its vision encoder. The context window spans 256,000 tokens.
Orchestrator learns to distribute work across agents
For training, Moonshot AI developed a method called "Parallel-Agent Reinforcement Learning" (PARL). A trainable orchestrator agent learns to break tasks into parallelizable subtasks. Dynamically created sub-agents then execute these subtasks, each taking on specialized roles like "AI researcher," "physics researcher," or "fact-checker."

A common problem with these systems is what Moonshot AI calls "Serial Collapse." The orchestrator falls back to sequential execution even when parallel capacity is available. To counter this, PARL uses a staged reward system that encourages parallelism early in training and shifts focus to task quality later.

The company demonstrates this with a task where K2.5 had to identify the top three YouTube creators in 100 different niches. The model independently created 100 sub-agents that researched in parallel and compiled the results into a structured table.
Visual input drives coding capabilities
Moonshot AI positions K2.5 as particularly strong in coding, especially frontend development. The model can create complete user interfaces with interactive layouts and animations from simple text descriptions.
K2.5 can also reason about images and videos and generate code from them. The company shows how the model can reconstruct a website from a video or calculate and mark the shortest path through a maze image.
Benchmarks shows strong performance
In the benchmarks Moonshot AI published, K2.5 hits top scores on some tests but trails the competition on others. For agentic tasks, K2.5 performs significantly better than rivals in some cases. On BrowseComp, the model reaches 74.9 percent, while GPT-5.2 hits 65.8 percent and Gemini 3 Pro reaches 59.2 percent. K2.5 also leads on DeepSearchQA with 77.1 percent, ahead of Claude 4.5 Opus at 76.1 percent.

On SWE-Bench Verified for software engineering tasks, K2.5 scores 76.8 percent. GPT-5.2 and Claude 4.5 Opus reach 80 and 80.9 percent, respectively. On the multilingual SWE-Bench tests, Claude 4.5 Opus leads with 77.5 percent, followed by K2.5 at 73 percent.
For image and video benchmarks, K2.5 keeps pace with the competition. On MMMU Pro, it reaches 78.5 percent, just behind Gemini 3 Pro at 81 percent. On VideoMMMU, K2.5 scores 86.6 percent, slightly ahead of GPT-5.2 but just behind Gemini 3 Pro.
K2.5 is available through Kimi.com, the Kimi app, and an API. The weights are available for download on Hugging Face. Agent Swarm is currently in beta and available to paying users with free credits. Four modes are available: K2.5 Instant, K2.5 Thinking, K2.5 Agent, and K2.5 Agent Swarm.
Moonshot AI was founded in 2023 and has quickly established itself as one of China's leading language model providers with the Kimi model family. The company competes with US providers like OpenAI and Anthropic as well as Chinese rivals like DeepSeek and its V3.2 model.
AI News Without the Hype – Curated by Humans
As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.
Subscribe now