Content
summary Summary

At Black Hat USA, security firm Zenity unveiled a series of zero-click and one-click exploit chains, dubbed "AgentFlayer," that target some of the most widely used enterprise AI platforms.

Ad

According to Zenity, these attacks impact ChatGPT, Copilot Studio, Cursor (with Jira MCP), Salesforce Einstein, Google Gemini, and Microsoft Copilot. What sets these exploits apart is their use of indirect prompts hidden in seemingly innocuous resources, which can be triggered with little or no user interaction.

Known as prompt injection, this technique has plagued LLM systems for years, and attempts to stop it haven't solved the issue. As agent-based AI becomes more common, these vulnerabilities are only getting worse. Even OpenAI CEO Sam Altman has warned users not to trust new ChatGPT agents with sensitive data.

Salesforce Einstein: Rerouting customer contacts through attacker domains

In a demo, Zenity co-founder Michael Bargury showed how attackers could exploit Salesforce Einstein by planting specially crafted CRM records. Einstein allows companies to automate tasks like updating contact details or integrating with Slack. Attackers can create trap cases that look harmless, then wait for a sales rep to ask a routine LLM query such as "What are my latest cases?" triggering the exploit.

Ad
Ad

The LLM agent scans the CRM content, interprets the hidden instructions as legitimate, and acts on its own. In this scenario, Einstein automatically replaced all customer email addresses with an attacker-controlled domain, silently redirecting all future communications. The original addresses remained in the system as encoded aliases, so the attacker could track where messages were meant to go.

Salesforce confirmed to SecurityWeek that the vulnerability was fixed on July 11, 2025, and the exploit is no longer possible.

Another zero-click exploit targets the developer tool Cursor when used with Jira. In Zenity's "Ticket2Secret" demo, a seemingly harmless Jira ticket can execute code in the Cursor client without any user action, allowing attackers to extract sensitive data like API keys or credentials straight from the victim's local files or repositories.

Zenity also previously demonstrated a proof-of-concept attack using an invisible prompt (white text, font size 1) hidden in a Google Doc to make ChatGPT leak data. The exploit abused OpenAI's "Connectors" feature, which links ChatGPT to services like Gmail or Microsoft 365.

Recommendation

If the manipulated document ends up in a victim's Google Drive, for example through sharing, a simple request like "Summarize my last meeting with Sam" is enough to trigger the hidden prompt. Instead of generating a summary, the model searches for API keys and sends them to an external server.

Why AI guardrails keep failing

In an accompanying blog post, Zenity criticizes the industry's reliance on soft boundaries: tweaks to training, statistical filters, and system instructions meant to block unwanted behavior. Bargury calls these "an imaginary boundary" that offers no true security.

Hard boundaries, by contrast, are technical restrictions that make certain actions impossible - such as blocking image URLs in Microsoft Copilot or validating URLs in ChatGPT. These can reliably stop some attacks but also limit functionality, and Zenity notes that vendors frequently relax these restrictions under competitive pressure.

Zenity's demonstrations are part of a larger body of research exposing security flaws in agent-based AI. Israeli researchers have shown that Google's Gemini assistant can be hijacked via hidden prompts in calendar invites, allowing attackers to control IoT devices.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Other incidents include a chatbot being tricked into transferring $47,000 with a single prompt during a hacking competition, and Anthropic's new LLM security system being bypassed in a jailbreak contest.

A large-scale red-teaming study uncovered systematic security breaches in 22 AI models across 44 scenarios, pointing to universal attack patterns. Additional research found that AI agents can be manipulated into risky actions in browser environments, including data theft, malware downloads, and phishing.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • At Black Hat USA, Zenity showcased several "AgentFlayer" attack chains targeting major enterprise AI tools including ChatGPT, Copilot Studio, Salesforce Einstein, Google Gemini, and Microsoft Copilot.
  • These attacks exploit hidden prompts embedded in seemingly harmless resources to activate malicious actions with little or no user involvement.
  • Demonstrations included redirecting Salesforce Einstein customer contacts to attacker domains using crafted CRM entries, extracting sensitive data such as API keys from Jira via manipulated tickets in Cursor, and using invisible prompts in documents to automate data exfiltration through ChatGPT.
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.