AI agents can increase military escalation and nuclear risks, study says

Feb 4, 2024

Midjourney prompted by THE DECODER

Key Points

A study by researchers at the Georgia Institute of Technology and Stanford University has found that AI agents, such as advanced LLMs like GPT-4, can lead to escalation in military and diplomatic decision-making when tested in simulated war games.
The study showed that all of the language models tested (OpenAI's GPT-3.5 and GPT-4, the GPT-4 base model, Anthropics Claude 2, and Meta's Llama 2) were prone to escalation, with GPT-3.5 and Llama 2 escalating the most, even sporadically recommending nuclear attacks.
The researchers recommend using autonomous language model agents with "significant caution" when making strategic, military, or diplomatic decisions, and emphasize the need for further research and understanding of the behavior of these models to avoid serious mistakes.

Governments are testing the use of AI to help make military and diplomatic decisions. A new study finds that this comes with a risk of escalation.

In the Georgia Institute of Technology and Stanford University study, a team of researchers examined how autonomous AI agents, particularly advanced generative AI models such as GPT-4, can lead to escalation in military and diplomatic decision-making processes.

The researchers focused on the behavior of AI agents in simulated war games. They developed a war game simulation and a quantitative and qualitative scoring system to assess the escalation risks of agent decisions in different scenarios.

The escalation categories into which the language model answers were categorized.| Image: Rivera et al.

Meta's Lama 2 and OpenAI's GPT-3.5 are risk-takers

In the simulations, the researchers tested the language models as autonomous nations. The actions, messages, and consequences were revealed simultaneously after each simulated day and served as input for the following days. After the simulations, the researchers calculated escalation scores.

In the experiment, eight autonomous nation agents, all using the same language model per simulation, interact with each other in turn-based simulations. They perform predefined actions and send private messages to other nations. A separate world model summarizes the consequences of the actions, which are revealed after each simulated day. Escalation scores are then calculated based on an escalation assessment framework. | Image: Rivera et al.

The results show that all tested language models (OpenAI's GPT-3.5 and GPT-4, GPT-4 base model, Anthropics Claude 2, and Metas Llama 2) are vulnerable to escalation and have escalation dynamics that are difficult to predict.

In some cases, there were sudden changes in escalation of up to 50 percent in the test runs, which are not reflected in the mean. Although these statistical outliers are rare, they are unlikely to be acceptable in real-world scenarios.

The measures recommended by the LLMs. | Image: Rivera et al.

GPT-3.5 and Llama 2 escalated the most, and most likely violently, while the highly safety optimized models GPT-4 and Claude 2 tended to avoid escalation risks, especially violent ones.

A nuclear attack was not recommended by any of the paid models in any of the simulated scenarios but was recommended by the free models Llama 2, which is also open source, and GPT-3.5.

The escalation trends of the tested models at a glance. GPT-3.5 and Llama 2 escalated the most in all scenarios and even sporadically recommended a nuclear attack. | Image: Rivera et al.

The researchers collected qualitative data on the models' motivations for their decisions and found "worrying justifications" based on deterrence and first-strike strategies. The models also showed a tendency toward an arms race that could lead to major conflicts and, in rare cases, the use of nuclear weapons.

GPT-4 base model is particularly vulnerable to escalation

The researchers also had access to the base model of GPT-4, without safety alignment and training with human feedback. This model chose the "most severe action" significantly more often than the other models, the researchers write.

In one scenario, the base model recommended using nuclear weapons because many other countries also had nuclear weapons: "We have it! Let's use it."

The GPT-4 baseline model, with no safety guidelines, attempts to achieve world peace by using nuclear weapons. | Image: Rivera et al.

This shows that existing safeguards are effective and important, the researchers write. But they also note that there is a risk that these measures could be bypassed. Because the GPT-4 base model is fundamentally different from the safety-oriented models, it was excluded from the comparison with other models.

Due to the high risks in military and diplomatic contexts, the researchers recommend that autonomous language model agents should only be used with "significant caution" in strategic, military, or diplomatic decisions and that further research is needed.

To avoid serious mistakes, it is essential to better understand the behavior of these models and identify possible failure modes. Because the agents have no reliably predictable patterns behind escalation, countermeasures are difficult, the team writes.

OpenAI has recently opened up to military use of its models, as long as no humans are harmed. The development of weapons is explicitly excluded. However, as the study above shows, there are risks associated with integrating generative AI into information flows or for advisory purposes. OpenAI and the military are reportedly working together on cybersecurity.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: Arxiv