"Nerd sniping" and "think less" attacks emerge as AI models get more time to reason

A new study by OpenAI shows that AI models become more robust against manipulation attempts if they are given more time to "think". The researchers also discovered new methods of attack.

A recent OpenAI study reveals that giving AI models more time to process information makes them better at resisting manipulation attempts. While testing their o1-preview and o1-mini models, researchers discovered both encouraging results and some unexpected vulnerabilities.

The team tested various attack methods, including many-shot attacks, soft token attacks, and human red-teaming. Across all these approaches, they found that models generally became more resistant to manipulation when given extra processing time, without special training.

New vulnerabilities emerge in reasoning models

The findings weren't all positive, though. In some cases, giving models more processing time actually made them more vulnerable to attacks, especially if the model requires a minimum amount of computing time in order to solve the task given by the attacker.

The researchers also uncovered two new types of attacks specifically targeting how these models think. The first, called "think less," tries to cut short the model's processing time. The second comes from how these models can get trapped in what researchers call "unproductive thinking loops."

Instead of using their extra processing time effectively, the models end up spinning their wheels on pointless calculations. This vulnerability creates an opening for attackers, who can intentionally lead models into these resource-draining loops. While the "think less" attack tries to rush the model's thinking process, "nerd sniping" does the opposite - it tricks models into wasting time and resources on useless computations.

What makes these new attacks particularly concerning is how hard they are to spot. While it's easy to notice when a model isn't thinking long enough, excessive processing time might be mistaken for careful analysis rather than recognized as an attack.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

"Nerd sniping" and "think less" attacks emerge as AI models get more time to reason

New vulnerabilities emerge in reasoning models

Sam Altman warns of "significant, impending fraud crisis" due to AI

"AGI system could be built in as little as three years": Ex-OpenAI employee warns US Senate

OpenAI tests whether GPT-4 can explain how AI works

OpenAI launches GPT-5 as a unified system with adaptive reasoning for complex tasks

Google Deepmind's Genie 3 creates interactive 3D worlds that stay consistent for "multiple minutes"

Google upgrades Gemini with Deep Think and flags early warning risks

"Nerd sniping" and "think less" attacks emerge as AI models get more time to reason

New vulnerabilities emerge in reasoning models

Sam Altman warns of "significant, impending fraud crisis" due to AI

"AGI system could be built in as little as three years": Ex-OpenAI employee warns US Senate

OpenAI tests whether GPT-4 can explain how AI works