Ad
Skip to content

An AI agent hacked McKinsey's internal AI platform in two hours using a decades-old technique

Image description
Nano Banana Pro prompted by THE DECODER

Key Points

  • The security firm Codewall used an AI agent to gain full access to McKinsey's AI platform Lilli in two hours through a classic SQL injection.
  • The system prompts controlling Lilli's behavior were stored in the same database. An attacker could have silently manipulated how the AI worked for 43,000 users.
  • McKinsey patched the system within a day. But the case makes one thing clear: when prompts sit in regular databases, a well-known vulnerability becomes a lever for silent AI manipulation.

Security firm Codewall turned an offensive AI agent loose on McKinsey's internal AI platform Lilli, a system used by over 43,000 employees for strategy work, client research, and document analysis. No credentials, no insider knowledge, no human assistance. Within two hours, the agent had full read and write access to the production database.

The entry point was a SQL injection vulnerability that conventional scanners missed. The values in API requests were properly parameterized, but JSON field names were being inserted directly into SQL queries. Over 15 blind iterations, the agent extracted increasingly detailed information from error messages until production data started flowing back. That included 46.5 million chat messages, 728,000 files, and 57,000 user accounts, all accessible without authentication.

Prompts become the new attack surface

The most alarming discovery was that the system prompts controlling Lilli's behavior were stored in the same database. Codewall explains that an attacker with write access through the same injection could have rewritten those prompts silently. No deployment needed, no code changes, just a single UPDATE statement in one HTTP call. The potential consequences range from poisoned financial models and manipulated strategy recommendations to silent data exfiltration through AI responses. And nobody would have noticed, because modified prompts don't leave traditional traces.

On top of that, the agent gained access to 3.68 million RAG document chunks, meaning the entire knowledge base feeding Lilli's responses. Decades of proprietary McKinsey research, frameworks, and methodologies were all sitting in an unsecured database.

Ad
DEC_D_Incontent-1

McKinsey patched the vulnerabilities within a day of being notified on March 1. A forensic investigation by an external firm found no evidence that client data or confidential client information was accessed by the researcher or any other unauthorized third party, a McKinsey spokesperson told The Register.

A decades-old bug with entirely new consequences

There's an almost ironic twist to this case. The vulnerability the agent exploited was a SQL injection, "one of the oldest bug classes in the book," as Codewall itself notes. Not some novel AI-specific attack, but a bug that's been known since the 1990s. The fact that it survived for two years in a McKinsey production database without being caught by conventional scanners raises serious questions. The unusual attack vector targeted JSON field names rather than input values, which explains why standard security tools missed it, as security analyst Edward Kiledjian points out in an independent analysis.

What's genuinely new here is the potential fallout. Because prompts, RAG data, and model configurations sit in the same databases as everything else, a classic vulnerability becomes a lever that can silently alter how an AI system behaves for thousands of users.

Codewall's takeaway is blunt. "AI prompts are the new Crown Jewel assets," the company writes. Organizations have spent decades securing their code, servers, and supply chains, but the prompt layer is the new high-value target and almost nobody is treating it as one.

Ad
DEC_D_Incontent-2

Kiledjian offers some important context, though. While Codewall likely found a serious vulnerability, its blog post overstates what was actually demonstrated and blurs the line between having access and actually exfiltrating data.

Codewall sells an autonomous platform for offensive security testing and is currently in an early preview phase. The McKinsey hack clearly doubles as a calling card. The company says it operated under McKinsey's public responsible disclosure policy on HackerOne. Whether that policy actually covers systematically reading a production database containing millions of real user records is debatable, as Kiledjian notes.

None of that changes the core finding. Anyone shipping AI systems to production needs to take their security architecture just as seriously as their traditional infrastructure. And that's clearly a struggle even for companies that should know better.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: Codewall