Microsoft's AI security team has put more than 100 generative AI products through their paces since 2021, looking for weak spots and ethical concerns. Their findings challenge some common assumptions about AI security and highlight the continued importance of human expertise.
It turns out the most effective attacks aren't always the most sophisticated ones. "Real hackers don't calculate gradients, they use prompt engineering," notes one study cited in Microsoft's report, which compares AI security research with real-world practices. During one test, the team managed to bypass an image generator's safety features just by hiding harmful instructions within image text - no complex math required.
The human touch remains essential
While Microsoft has developed PyRIT, an open-source tool that automates security testing, the team emphasizes that human judgment can't be replaced. This became particularly clear when they tested how chatbots handle sensitive situations, like conversations with people in emotional distress. Evaluating these scenarios requires both psychological expertise and a deep understanding of potential mental health impacts.
The team also relied on human insight when investigating AI bias. In one example, they examined gender bias in an image generator by creating pictures of different occupations without specifying gender.
New security challenges emerge
The integration of AI into everyday applications has opened up new vulnerabilities. In one test, the team managed to manipulate a language model into creating convincing fraud scenarios. When combined with text-to-speech technology, this created a system that could interact with people in dangerously realistic ways.
The risks aren't limited to AI-specific problems either. The team discovered a traditional security flaw (SSRF) in an AI video processing tool, showing that these systems face both old and new security challenges.
Ongoing security needs
The research paid special attention to "Responsible AI" risks - cases where AI systems might generate harmful or ethically questionable content. These issues are particularly tricky to address because they often depend heavily on context and individual interpretation.
Microsoft's team found that unintentional exposure to problematic content by regular users can be more concerning than deliberate attacks, as it suggests the safety measures aren't working as intended during normal use.
The findings make it clear that AI security isn't a one-and-done fix. Microsoft recommends an ongoing cycle of finding and fixing vulnerabilities, followed by more testing. They suggest this needs to be backed up by regulations and financial incentives that make successful attacks more costly.
According to the team, several key questions remain: How can we identify and control potentially dangerous AI capabilities like persuasion and deception? How do we adapt security testing for different languages and cultures? And how can companies share their methods and results in a standardized way?