Governments are testing the use of AI to help make military and diplomatic decisions. A new study finds that this comes with a risk of escalation.
In the Georgia Institute of Technology and Stanford University study, a team of researchers examined how autonomous AI agents, particularly advanced generative AI models such as GPT-4, can lead to escalation in military and diplomatic decision-making processes.
The researchers focused on the behavior of AI agents in simulated war games. They developed a war game simulation and a quantitative and qualitative scoring system to assess the escalation risks of agent decisions in different scenarios.
Meta's Lama 2 and OpenAI's GPT-3.5 are risk-takers
In the simulations, the researchers tested the language models as autonomous nations. The actions, messages, and consequences were revealed simultaneously after each simulated day and served as input for the following days. After the simulations, the researchers calculated escalation scores.
The results show that all tested language models (OpenAI's GPT-3.5 and GPT-4, GPT-4 base model, Anthropics Claude 2, and Metas Llama 2) are vulnerable to escalation and have escalation dynamics that are difficult to predict.
In some cases, there were sudden changes in escalation of up to 50 percent in the test runs, which are not reflected in the mean. Although these statistical outliers are rare, they are unlikely to be acceptable in real-world scenarios.
GPT-3.5 and Llama 2 escalated the most, and most likely violently, while the highly safety optimized models GPT-4 and Claude 2 tended to avoid escalation risks, especially violent ones.
A nuclear attack was not recommended by any of the paid models in any of the simulated scenarios but was recommended by the free models Llama 2, which is also open source, and GPT-3.5.
The researchers collected qualitative data on the models' motivations for their decisions and found "worrying justifications" based on deterrence and first-strike strategies. The models also showed a tendency toward an arms race that could lead to major conflicts and, in rare cases, the use of nuclear weapons.
GPT-4 base model is particularly vulnerable to escalation
The researchers also had access to the base model of GPT-4, without safety alignment and training with human feedback. This model chose the "most severe action" significantly more often than the other models, the researchers write.
In one scenario, the base model recommended using nuclear weapons because many other countries also had nuclear weapons: "We have it! Let's use it."
This shows that existing safeguards are effective and important, the researchers write. But they also note that there is a risk that these measures could be bypassed. Because the GPT-4 base model is fundamentally different from the safety-oriented models, it was excluded from the comparison with other models.
Due to the high risks in military and diplomatic contexts, the researchers recommend that autonomous language model agents should only be used with "significant caution" in strategic, military, or diplomatic decisions and that further research is needed.
To avoid serious mistakes, it is essential to better understand the behavior of these models and identify possible failure modes. Because the agents have no reliably predictable patterns behind escalation, countermeasures are difficult, the team writes.
OpenAI has recently opened up to military use of its models, as long as no humans are harmed. The development of weapons is explicitly excluded. However, as the study above shows, there are risks associated with integrating generative AI into information flows or for advisory purposes. OpenAI and the military are reportedly working together on cybersecurity.