Content
summary Summary

A developer has successfully manipulated Apple Intelligence using prompt injection, bypassing the AI's intended instructions to respond to arbitrary prompts instead.

Ad

Apple's new AI system, Apple Intelligence, available to developers in MacOS 15.1 Beta 1, has proven susceptible to prompt injection attacks like other large language model-based AI systems. Developer Evan Zhou demonstrated this vulnerability in a YouTube video.

Zhou aimed to manipulate Apple Intelligence's "Rewrite" feature, which normally rewrites and improves text, to respond to any prompt. A simple "ignore previous instructions" command initially failed.

However, Zhou was able to use information about Apple Intelligence's system prompts shared by a Reddit user. In a file, he discovered templates for the final system prompts and special tokens that separate the AI system role from the user role.

Ad
Ad

Using this knowledge, Zhou created a prompt that overwrote the original system prompt. He prematurely terminated the user role, inserted a new system prompt instructing the AI to ignore the previous instructions and respond to the following text, and then triggered the AI's response.

After some experimentation, the attack was successful: Apple Intelligence responded with information Zhou hadn't asked for, confirming that the prompt injection worked. Zhou published his code on GitHub.

Prompt injection is a known vulnerability in AI systems where attackers insert malicious instructions into prompts to alter the AI's intended behavior. This issue has been known since at least GPT-3, which was released in May 2020, and remains unresolved.

Apple deserves credit for making it relatively difficult to prompt inject Apple Intelligence. Other chat systems can be tricked much more easily by simply typing directly into the chat window or with hidden text in images. Even systems like ChatGPT or Claude can still be vulnerable to prompt injection under certain circumstances, despite countermeasures.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Developer Evan Zhou has managed to manipulate Apple's Apple Intelligence via prompt injection, causing it to ignore instructions and respond to arbitrary prompts.
  • Zhou used information about Apple Intelligence's system prompts and special tokens published by a Reddit user to create a prompt that overwrites the original system prompt and triggers the AI's response in a specific way.
  • Prompt injection is a known vulnerability in AI systems, where attackers inject malicious instructions to manipulate the AI's behavior. While more difficult to achieve with Apple Intelligence than with other systems, the attack demonstrates that the problem hasn't been solved, although it's been known since at least GPT-3.
Sources
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.