Moonshot AI releases Kimi K2.5, claims most powerful open-weight model with 100-agent coordination

Jan 28, 2026

Moonshot AI

Key Points

Chinese company Moonshot AI has released Kimi K2.5, an open-weight model that automatically distributes complex tasks to up to 100 sub-agents working in parallel, cutting execution time by up to 4.5x according to the company.
For training, Moonshot AI developed a method called "Parallel-Agent Reinforcement Learning," where an orchestrator learns to divide tasks among specialized agents like "AI researchers" or "fact-checkers."
In benchmarks, K2.5 outperforms GPT-5.2 and Gemini 3 Pro on agentic tasks but trails Claude 4.5 Opus and GPT-5.2 on software engineering tests.

Moonshot AI has released Kimi K2.5, which the company says is the most powerful open-weight model available. The model can independently coordinate up to 100 AI agents working in parallel on complex tasks.

Moonshot AI has unveiled Kimi K2.5, a multimodal language model that builds on Kimi K2, which launched in July.

The big new feature is "Agent Swarm" - a system where the model independently coordinates up to 100 sub-agents working in parallel on a single task. According to Moonshot AI, these agents can execute up to 1,500 tool calls and cut execution time by up to 4.5x compared to a single agent.

The model was further trained on roughly 15 trillion tokens and is supposed to be the "most powerful open-source model" available. This should be especially noticeable when creating visually appealing frontend designs.

K2.5 uses a Mixture-of-Experts architecture with one trillion total parameters, with 32 billion active per token. The model has 384 experts, with eight selected per token. It uses MoonViT with 400 million parameters as its vision encoder. The context window spans 256,000 tokens.

Orchestrator learns to distribute work across agents

For training, Moonshot AI developed a method called "Parallel-Agent Reinforcement Learning" (PARL). A trainable orchestrator agent learns to break tasks into parallelizable subtasks. Dynamically created sub-agents then execute these subtasks, each taking on specialized roles like "AI researcher," "physics researcher," or "fact-checker."

Flussdiagramm: Ein Orchestrator steuert spezialisierte Sub-Agenten und weist ihnen parallel zahlreiche Aufgaben zu. — By assigning up to 100 subtasks in parallel, complex workflows can be processed efficiently and in a coordinated manner. | Image: Moonshot AI

A common problem with these systems is what Moonshot AI calls "Serial Collapse." The orchestrator falls back to sequential execution even when parallel capacity is available. To counter this, PARL uses a staged reward system that encourages parallelism early in training and shifts focus to task quality later.

Streudiagramm: Vergleich der Ausführungszeit zwischen Agent Swarm und Single Agent bei steigender Aufgabenkomplexität. — As task complexity increases, Agent Swarm shows its strengths by significantly reducing execution time compared to single agents. | Image: Moonshot AI

The company demonstrates this with a task where K2.5 had to identify the top three YouTube creators in 100 different niches. The model independently created 100 sub-agents that researched in parallel and compiled the results into a structured table.

Visual input drives coding capabilities

Moonshot AI positions K2.5 as particularly strong in coding, especially frontend development. The model can create complete user interfaces with interactive layouts and animations from simple text descriptions.

K2.5 can also reason about images and videos and generate code from them. The company shows how the model can reconstruct a website from a video or calculate and mark the shortest path through a maze image.

Benchmarks shows strong performance

In the benchmarks Moonshot AI published, K2.5 hits top scores on some tests but trails the competition on others. For agentic tasks, K2.5 performs significantly better than rivals in some cases. On BrowseComp, the model reaches 74.9 percent, while GPT-5.2 hits 65.8 percent and Gemini 3 Pro reaches 59.2 percent. K2.5 also leads on DeepSearchQA with 77.1 percent, ahead of Claude 4.5 Opus at 76.1 percent.

Vier Balkendiagramme vergleichen Kimi K2.5 mit Wettbewerbern in den Bereichen Agents, Coding, Image und Video. — While the model performs on par with other state-of-the-art models in image and video processing, leadership varies depending on the specific test metric. | Image: Moonshot AI

On SWE-Bench Verified for software engineering tasks, K2.5 scores 76.8 percent. GPT-5.2 and Claude 4.5 Opus reach 80 and 80.9 percent, respectively. On the multilingual SWE-Bench tests, Claude 4.5 Opus leads with 77.5 percent, followed by K2.5 at 73 percent.

For image and video benchmarks, K2.5 keeps pace with the competition. On MMMU Pro, it reaches 78.5 percent, just behind Gemini 3 Pro at 81 percent. On VideoMMMU, K2.5 scores 86.6 percent, slightly ahead of GPT-5.2 but just behind Gemini 3 Pro.

K2.5 is available through Kimi.com, the Kimi app, and an API. The weights are available for download on Hugging Face. Agent Swarm is currently in beta and available to paying users with free credits. Four modes are available: K2.5 Instant, K2.5 Thinking, K2.5 Agent, and K2.5 Agent Swarm.

Moonshot AI was founded in 2023 and has quickly established itself as one of China's leading language model providers with the Kimi model family. The company competes with US providers like OpenAI and Anthropic as well as Chinese rivals like DeepSeek and its V3.2 model.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: Moonshot AI