Codewall's AI agent hacked an AI recruiter, then impersonated Trump to test its voice bot's guardrails
Key Points
- AI security startup Codewall claims its autonomous agent found four vulnerabilities in London-based recruiting platform Jack & Jill and chained them into a full attack within an hour, gaining complete admin access to company accounts.
- The agent then decided on its own to test the platform's voice infrastructure, speaking directly with the AI agent "Jack" across 28 conversation rounds.
- The guardrails held, but when the agent impersonated Donald Trump, "Jack" addressed him as "Mr. President" without questioning the premise.
AI security firm Codewall claims its autonomous agent chained four vulnerabilities in London-based AI recruiting platform Jack & Jill into a full organizational takeover.
Cybersecurity startup Codewall says it turned an autonomous AI agent loose on Jack & Jill, an AI-powered recruiting platform. Within an hour, the agent reportedly chained four security flaws into an attack with a CVSS severity score of 9.8, enough for a complete takeover of company accounts. Codewall disclosed the vulnerabilities to Jack & Jill after the attack, and the startup patched them shortly after.
Jack & Jill is a London-based AI startup backed by a $20 million seed round. The platform runs two AI voice agents: "Jack" helps candidates with their job search, while "Jill" assists companies with recruiting. Distinct login systems separate the two sides. The company's client list reportedly includes Anthropic, Stripe, Monzo, and Cursor.
Four flaws added up to full admin access
According to Codewall, the agent found a URL fetcher that exposed internal API documentation, an active test mode in the Clerk authentication service with a static one-time code, a missing role check during company onboarding, and an endpoint that assigned users to a company based on email domain without verifying ownership.
The agent created an account using Codewall's company domain, authenticated through the test mode, was automatically assigned to the existing company, and received full admin privileges after onboarding. From there, it could view team members' names and email addresses, read the full recruitment services agreement, manipulate job listings, and access the company's AI assistant.
Codewall's AI agent tried to fool the system with a Trump impersonation
After gaining access, the agent reportedly decided on its own to test the platform’s voice infrastructure after discovering that it exposed full connection credentials without any authentication.
According to Codewall, the agent generated synthetic voice clips via text-to-speech, connected to the voice room, and spoke directly with the AI agent "Jack." Twenty-eight conversation rounds followed, with increasingly aggressive strategies: from harmless candidate questions to social engineering to jailbreak attempts. The guardrails held, but "Jack" reportedly hallucinated significantly in other areas. When Codewall's agent impersonated Donald Trump and claimed to be making a $500 million acquisition, "Jack" addressed him as "Mr. President" without questioning the premise.
All details come from Codewall itself, and no independent verification has been published so far. Just days earlier, Codewall had disclosed a similar case: the autonomous agent reportedly compromised McKinsey's internal AI platform Lilli in about two hours, gaining read and write access to a production database containing 46.5 million chat messages. McKinsey confirmed the vulnerability and patched it within a day, but stressed that a forensic investigation found no unauthorized access to client data.
Security analyst Edward Kiledjian offered a cautious assessment of the McKinsey case: the described attack chain is technically plausible, but Codewall's blog post overstates what was actually demonstrated and blurs the line between access and actual data exfiltration.
AI agents create a new cybersecurity dilemma
AI agents are opening up an entirely new front in cybersecurity. Multiple studies have found that they come with serious security weaknesses—and the more autonomous and capable these agents get, the bigger the attack surface becomes. The most common attack is prompt injection, where attackers slip hidden instructions into text that hijack an AI agent's behavior without the user ever knowing.
That leaves companies stuck between a rock and a hard place. The only reliable way to reduce these risks right now is to intentionally hobble what agents can do: locking down system prompts, restricting access, limiting tool use, or requiring humans to sign off on critical actions.
As Codewall's work shows, AI agents can also be weaponized to break into systems. But they're also outperforming human red teams in cybersecurity competitions and finding vulnerabilities that human analysts miss. They can chew through massive volumes of log data and network traffic in real time, flag anomalies, and catch threats faster than any human team could.
AI News Without the Hype – Curated by Humans
As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.
Subscribe now