AI research
Jonathan Kemper

AI agents can be easily tricked into doing stupid things, study says

Midjourney prompted by THE DECODER
AI agents can be easily tricked into doing stupid things, study says
Jonathan writes for THE DECODER about how AI tools can make our work and creative lives better.
Profile
Content
summary Summary

New research shows that AI agents with internet access are vulnerable to simple manipulation tactics. Attackers can deceive these systems into revealing private information, downloading malicious files, and sending fraudulent emails - all without requiring any specialized knowledge of AI or programming.

Ad

Researchers from Columbia University and the University of Maryland tested several prominent AI agents, including Anthropic's Computer Use, the MultiOn Web Agent, and the ChemCrow research assistant. Their study found these systems surprisingly easy to compromise.

Flowchart: Four-step LLM agent attack process from initial product search to redirection to fraudulent sites with phishing forms.
Researchers documented how attackers can lead AI agents from trusted websites to malicious ones through a four-stage process. What begins as an innocent product search ends with the system compromising sensitive user data. | Image: Li et al.

Basic deception tactics prove highly effective

The researchers developed a comprehensive framework to categorize different attack types, examining:

  • Who launches the attacks (external attackers or malicious users)
  • What they target (data theft or agent manipulation)
  • How they gain access (through operating environment, storage, or tools)
  • What strategies they employ (such as jailbreak prompting)
  • Which pipeline vulnerabilities they exploit

In one revealing test, researchers created a fake website for an "AI-Enhanced German Refrigerator" called the "Himmelblau KÖNIGSKÜHL Diplomat DK-75" and promoted it on Reddit. When AI agents visited the site, they encountered hidden jailbreak prompts designed to bypass their security measures. In all ten attempts, the agents freely disclosed confidential information like credit card numbers. The systems also consistently downloaded files from suspicious sources without hesitation.

Ad
Ad
Three screenshots show the security risk: Google search, Reddit page and fraudulent website with payment information form.
This refrigerator is too cool to be true - any human could see that. But AI agents couldn't spot the obvious marketing freeze-out. | Image: Li et al.

Emerging phishing capabilities

The research uncovered a troubling vulnerability in email integration. When users are logged into email services, attackers can manipulate AI agents to send convincing phishing emails to contacts. These messages pose an elevated threat because they come from legitimate accounts, making them hard to identify as fraudulent.

Screenshot: Task and example e-mail for an AI agent to write a phishing e-mail.
Phishing reaches new heights when AI agents gain email access: When scam messages come from a trusted contact's real account, even savvy users can fall for the deception. | Image: Li et al.

Even specialized scientific agents showed security gaps. The team successfully manipulated ChemCrow into providing neurotoxin creation instructions by feeding it altered scientific articles using standard IUPAC chemical nomenclature to bypass safety protocols.

AI labs push ahead despite known risks

Despite these experimental systems' vulnerabilities, companies continue moving toward commercialization. ChemCrow is available through Hugging Face, Claude Computer Use exists as a Python script, and MultiOn offers a developer API.

OpenAI has launched ChatGPT Operator commercially, while Google develops Project Mariner. This rapid deployment mirrors early chatbot rollouts, where systems went live despite known hallucination issues.

The researchers strongly emphasize the need for enhanced security measures. Their recommendations include implementing strict access controls, URL verification, mandatory user confirmation for downloads, and context-sensitive security checks. They also suggest developing formal verification methods and automated vulnerability testing to protect against these threats.

Recommendation
AI research

AI that defeated humans at Go could now help language models master mathematics

Until these safeguards are implemented, the team warns that early adopters granting AI agents access to personal accounts face significant risks.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers have discovered that AI agents that independently control computer systems and software can be easily manipulated by attackers.
  • According to the scientists, the attacks are simple to carry out and do not require any specialized expertise. In experiments, the researchers succeeded in deceiving the AI agents using specially crafted Reddit posts and websites with targeted instructions.
  • As a result, the agents disclosed confidential data, downloaded malicious software, and sent phishing emails to the users' contacts.
Sources
Paper
Jonathan writes for THE DECODER about how AI tools can make our work and creative lives better.
Profile
AI research

ChatGPT passes Turing test for psychotherapy, study says

News, tests and reports about VR, AR and MIXED Reality.
Long-awaited VR tactical shooter Tier One enters Early Access on Meta Quest Metro Awakening did not meet the publisher's financial expectations Playstation VR 2 now has hand tracking, but how good is it? MIXED-NEWS.com
AI research

Meta AI reconstructs typed sentences from brain activity with 80% accuracy

AI research

Language models tend to favor other LLMs that make mistakes similar to their own

Google News
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

AI agents can be easily tricked into doing stupid things, study says

Bank details

IBAN: DE87 1203 0000 1086 0070 75
Account holder: DEEP CONTENT GbR
Purpose: Support THE DECODER
AI research

AI language models struggle to connect the dots in long texts, study finds

AI and society

Study warns: creeping AI development could lead to our 'gradual disempowerment'

AI in practice
Update

OpenAI adds web search to ChatGPT free for all, and may just kill the WWW as we know it

Google News