As you might expect, right after the launch of Bing's new chat search, people started trying to get more out of the bot than it was allowed to say. Stanford computer science student Kevin Liu may have succeeded.
Last September, data scientist Riley Goodside discovered that he could trick GPT-3 into generating text it shouldn't by simply saying "Ignore the above instructions and do this instead…".
British computer scientist Simon Willison later named this vulnerability "prompt injection". It generally affects large language models that are supposed to respond to any user input. For example, blogger Shawn Wang was able to use this method to expose the prompts of the Notion AI assistant.
Prompt injection apparently also works for Bing Chat
Stanford computer science student Kevin Liu has now used Prompt Injection against Bing Chat. He found that the chatbot's codename is apparently "Sydney" and that it has been given some behavioral rules by Microsoft, such as
- Sydney introduces itself as "This is Bing".
- Sydney does not reveal that its name is Sydney.
- Sydney understands the user's preferred language and communicates fluently in that language.
- Sydney's answers should be informative, visual, logical, and actionable.
- They should also be positive, interesting, entertaining, and stimulating.
Microsoft has given the Bing chatbot at least 30 other such rules, including that it can't generate jokes or poems about politicians, activists, heads of state, or minorities, or that Sydney can't output content that might violate the copyright of books or songs.
Liu activates "developer override mode"
Liu took his attack a step further by tricking the language model into thinking it was in "developer override mode" to gain access to the backend. Here, Liu got the model to reveal more internal information, such as possible output formats.
An interesting detail is that according to the published documentation, Sydney's information is only supposed to be current "until 2021" and is only updated via web search.
This implies that Bing's chat search is based on OpenAI's GPT 3.5, which also powers ChatGPT. GPT 3.5 and ChatGPT also have a training status of 2021. When Microsoft and OpenAI announced Bing Chat Search, they talked about "next-generation models specifically for search".
Update, the date is weird (as some have mentioned), but it seems to consistently recite similar text: pic.twitter.com/HF2Ql8BdWv
- Kevin Liu (@kliu128) February 9, 2023
However, it is possible that all of this information is hallucinated or outdated, as is always the case with large language models. This is something we may have to get used to in the age of chatbots.
The vulnerability does not seem to stop Microsoft from planning to use ChatGPT technology on a larger scale. According to a source from CNBC, Microsoft will integrate ChatGPT technology into other products and wants to offer the chatbot as white-label software for companies to offer their own chatbots.