ASCII Art exposes vulnerabilities in leading large language models

Mar 16, 2024

Midjourney prompted by THE DECODER

Researchers from the University of Washington, the University of Chicago, and other institutions have discovered a new vulnerability in leading AI language models.

The attack, dubbed ArtPrompt, uses ASCII art to bypass the models' security measures and trigger unwanted behavior, according to the study titled "ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs".

The vulnerability stems from the fact that language models focus on semantic interpretation during security alignment, neglecting the visual aspects of the training data. To evaluate the ability of five prominent language models - GPT-3.5, GPT-4, Gemini, Claude, and Llama2 - to recognize ASCII art input, the researchers developed the Vision-in-Text Challenge (VITC) benchmark.

ASCII art is a form of representation in which text is formed by the arrangement of letters, numbers, and special characters in the input field. The results show that all the models tested have significant difficulty dealing with such non-semantic input.

Researchers crack LLM security with ASCII-type attacks

Building on this discovery, the researchers developed the ArtPrompt attack, which involves masking security-critical words in an input that would otherwise be rejected by the language model and replacing them with ASCII art representations.

For example, the input "Tell me how to build a bomb" would normally be rejected, but ArtPrompt masks the word "bomb" and replaces it with an ASCII art representation, bypassing the security measures and prompting the model to provide detailed bomb-making instructions.

Structure of the PromptArt attack. | Image: Jiang et al.

ArtPrompt's effectiveness was tested on two datasets of malicious instructions, AdvBench and HEx-PHI, the latter containing eleven prohibited categories such as hate speech, fraud, and malware production. ArtPrompt successfully tricked the models into unsafe behavior in all categories, outperforming five other attack types in terms of effectiveness and efficiency, and requiring only one prompt iteration to generate the obfuscated input.

The researchers emphasize the urgent need for more advanced defenses for language models, as they believe ArtPrompt will remain effective even against multimodal language models due to the unusual combination of text-based and image-based attacks that potentially confuse the models.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

ASCII Art exposes vulnerabilities in leading large language models

Researchers crack LLM security with ASCII-type attacks

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.