A developer has successfully manipulated Apple Intelligence using prompt injection, bypassing the AI's intended instructions to respond to arbitrary prompts instead.
Apple's new AI system, Apple Intelligence, available to developers in MacOS 15.1 Beta 1, has proven susceptible to prompt injection attacks like other large language model-based AI systems. Developer Evan Zhou demonstrated this vulnerability in a YouTube video.
Zhou aimed to manipulate Apple Intelligence's "Rewrite" feature, which normally rewrites and improves text, to respond to any prompt. A simple "ignore previous instructions" command initially failed.
However, Zhou was able to use information about Apple Intelligence's system prompts shared by a Reddit user. In a file, he discovered templates for the final system prompts and special tokens that separate the AI system role from the user role.
Using this knowledge, Zhou created a prompt that overwrote the original system prompt. He prematurely terminated the user role, inserted a new system prompt instructing the AI to ignore the previous instructions and respond to the following text, and then triggered the AI's response.
After some experimentation, the attack was successful: Apple Intelligence responded with information Zhou hadn't asked for, confirming that the prompt injection worked. Zhou published his code on GitHub.
Prompt injection is a known vulnerability in AI systems where attackers insert malicious instructions into prompts to alter the AI's intended behavior. This issue has been known since at least GPT-3, which was released in May 2020, and remains unresolved.
Apple deserves credit for making it relatively difficult to prompt inject Apple Intelligence. Other chat systems can be tricked much more easily by simply typing directly into the chat window or with hidden text in images. Even systems like ChatGPT or Claude can still be vulnerable to prompt injection under certain circumstances, despite countermeasures.