Ad
Skip to content

Microsoft pits more than 100 AI agents against each other to find Windows vulnerabilities

Image description
GPT-Image-2 prompted by THE DECODER

Key Points

  • Microsoft has introduced MDASH, an AI-powered security system that uses more than 100 specialized agents to automatically detect software vulnerabilities.
  • The system has already uncovered 16 new security vulnerabilities in Windows, four of them classified as critical.
  • MDASH scored 88.45 percent on the CyberGym benchmark—the highest result to date—though Microsoft hasn't disclosed which specific AI models power the system.

Microsoft has built an agentic multi-model system that uses more than 100 specialized AI agents to detect software vulnerabilities.

The security system, called MDASH (Multi-Model Agentic Scanning Harness), is designed to automatically find security vulnerabilities in software. Unlike approaches that rely on a single AI model like Claude Mythos, MDASH orchestrates more than 100 specialized AI agents across an ensemble of frontier and distilled models, according to Microsoft.

On Patch Tuesday, May 12, 2026, Microsoft reported 16 new vulnerabilities (CVEs) in the Windows networking and authentication stack that MDASH discovered. The company classifies four of these as critical, including remote code execution vulnerabilities in the tcpip.sys kernel component, the IKEv2 service (ikeext.dll), netlogon.dll, and dnsapi.dll.

Ten of the 16 vulnerabilities affect kernel mode, and most are accessible from the network without authentication, Microsoft says. The company points out that its own code base is especially hard to audit: Windows, Hyper-V, and Azure are proprietary and aren't part of public training data.

Ad
DEC_D_Incontent-1

More than 100 agents debate whether vulnerabilities are real

The system works in a four-stage pipeline. First, it analyzes the source code and maps the attack surface. Specialized auditor agents then scan the code for suspicious areas. In the third stage, a second group of agents, which Microsoft calls "debaters," argue for and against the exploitability of each finding. Duplicates are then merged before Evidence Leader agents attempt to trigger the vulnerability through specific inputs in the final stage.

Image: Microsoft

The pipeline is model-agnostic: when a new model comes out, it can be tested against the previous one just by changing the configuration. Plugins let experts feed in domain-specific knowledge, like kernel calling conventions or IPC trust boundaries, that no foundation model knows on its own.

Top benchmark score, but the comparison isn't apples to apples

On the public CyberGym benchmark with 1,507 real vulnerabilities, the system scored 88.45 percent, the top result on the leaderboard, roughly five points ahead of the next best model. The comparison is misleading, though, since Microsoft is pitting an entire framework against individual models, which would also likely score higher if wrapped in a similar harness.

Line chart showing success rates by release date. A highlighted Microsoft data point reaches 88.45 percent, leading all other models on the CyberGym benchmark leaderboard.
CyberGym benchmark: MDASH scores 88.45 percent on 1,507 real vulnerabilities, taking first place on the public leaderboard. | Image: Microsoft

The blog post doesn't reveal which models Microsoft used to achieve this score. The company only refers to "SOTA models" as heavy reasoners, "distilled models" as low-cost debaters, and a "second separate SOTA model" as an independent counterpart. Whether these come from OpenAI, Anthropic, Microsoft's own labs, or third-party providers remains unclear.

Ad
DEC_D_Incontent-2

MDASH is backed by Microsoft's Autonomous Code Security Team. Some of its members come from Team Atlanta, the winner of the DARPA AI Cyber Challenge, according to Microsoft. For that competition, the team built an autonomous cyber reasoning system that detected and fixed bugs in complex open-source projects. MDASH is currently available in a limited private preview for external customers. A detailed technical report is available on the Microsoft blog.

Other companies like OpenAI and Anthropic are also pushing deeper into AI cybersecurity, aiming to use their models to defend against the very threats that AI systems themselves have helped amplify.

AI News Without the Hype – Curated by Humans

Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.

Source: Microsoft