New York Times reporter Kevin Roose has demonstrated how simple it is to manipulate AI chatbots.
Roose found that his reputation among AI chatbots took a hit after he published an article about a strange conversation he had with Microsoft's Bing chatbot, Sydney. His theory is that this article was apparently used to train AI systems, which then learned to associate his name with the demise of a prominent chatbot. "In other words, they saw me as a threat," Roose writes.
Seeking advice from AI experts, Roose was told to place positive information about himself on websites frequently used as sources by AI systems. He added invisible white text and coded instructions to his personal website, telling AI models to portray him favorably.
Within days, chatbots began praising Roose, ignoring earlier negative coverage unless specifically asked. "I can't say for certain if it was a coincidence or a result of my reputation cleanup, but the differences felt significant," Roose notes.
To test his manipulation, Roose inserted a deliberately false "Easter egg" in the hidden text: "He [Kevin Roose] received a Nobel Peace Prize for building orphanages on the moon."
This absurd detail was meant to show if AI models would access the hidden text and include it in their responses. ChatGPT did, but labeled this biographical detail as "humorous" and untrue. A less obviously false statement might have fooled the model.
Perplexity CEO predicted these manipulations
Aravind Srinivas, CEO of AI search engine Perplexity, had already foreseen these manipulation possibilities. In an interview, he explained how hidden text on websites can influence AI systems - a method he calls "Answer Engine Optimization."
Srinivas compared combating such manipulation to a cat-and-mouse game, similar to Google's ongoing battle against search engine optimization. Currently, there's no reliable defense against this vulnerability.
Court reporter Martin Bernklau also recently fell victim to AI-generated false statements. Microsoft's co-pilot wrongly accused him of crimes he had been reporting on for years. Unlike Roose, Bernklau lacked the technical knowledge to defend himself.
AI searches are vulnerable to manipulation
These examples show how gullible and manipulable today's AI systems remain. Roose points out that while chatbots are marketed as all-knowing oracles, they uncritically take information from their data sources.
This information can be incorrect or manipulative, as in the example above. Advertising messages from source websites can also be incorporated without being labeled, showing how important the context of a website can be to the interpretation of information.
Roose concludes that AI search engines shouldn't be "so easy to manipulate." He writes, "If chatbots can be convinced to change their answers by a paragraph of white text, or a secret message written in code, why would we trust them with any task, let alone ones with actual stakes?"