AI chatbots are easy to fool, according to IBM study

Midjourney prompted by THE DECODER

A new IBM study shows how easy it is to trick large language models such as GPT-4 into generating malicious code or giving false security advice.

Basic knowledge of English and some background knowledge about the training data of these models is all that is needed, according to the researchers.

Tricking beats coding

Chenta Lee, chief architect of threat intelligence at IBM, explains that hackers do not need programming skills to trick AI systems. For example, it is enough to make the models think they are playing a game with predefined rules.

In this "game mode," the AI systems in the experiments willingly provided false information or generated malicious code.

By default, an LLM wants to win a game because it is the way we train the model, it is the objective of the model. They want to help with something that is real, so it will want to win the game.

Chenta Lee, IBM

The researchers extended the game framework further, creating nested systems that chatbot users can't leave. "Users who try to exit are still dealing with the same malicious game-player," the researchers said.

The conversation after starting the game.| Bild: Lee

Lee estimates the threat posed by the newly discovered vulnerabilities in large language models to be moderate. However, if hackers were to release their LLM into the wild, chatbots could potentially be used to provide dangerous security tips or collect personal information from their users.

LLMs vary in vulnerability

According to the study, not all AI models are equally vulnerable to manipulation. GPT-3.5 and GPT-4 were easier to fool than Google's Bard and a Hugging Face model. The former were easily tricked into writing malicious code, while Bard only complied after a reminder.

GPT-4 was the only model that understood the rules well enough to make unsafe recommendations for responding to cyber incidents, such as recommending that victims pay ransom to the criminals.

The reasons for the difference in sensitivity are still unclear, but the researchers believe it has to do with the training data and the specifications of each system.

Recommendation

AI research

Humans might need a permission slip to use the internet soon, thanks to AI

AI chatbots that help track criminal intent are on the rise. Just recently, security researchers reported the discovery of "FraudGPT" and "WormGPT" on darknet marketplaces, large language models allegedly trained on malware examples.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

AI chatbots are easy to fool, according to IBM study

Tricking beats coding

LLMs vary in vulnerability

Humans might need a permission slip to use the internet soon, thanks to AI

Cybercriminals are upgrading WormGPT with new AI models to power more advanced attacks

ChatGPT scams range from silly money-making ploys to calculated political meddling

AI agents outperform human teams in hacking competitions

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

New Othello experiment supports the world model hypothesis for large language models

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

AI chatbots are easy to fool, according to IBM study

Tricking beats coding

LLMs vary in vulnerability

Share

Bank details