The senior U.S. official for AI deployment at the Pentagon warned in a talk against using large language models in the military.
Dr. Craig Martell, Chief Digital and Artificial Intelligence Officer (CDAO) for the U.S. Department of Defense, gave a sober talk at DEFCON about the hype surrounding generative AI, such as large language models. He warned against using them without effective validation methods.
As CDAO, Martel is the "senior official responsible for the acceleration of the DoD’s adoption of data, analytics, and AI to generate decision advantage across, from the boardroom to the battlefield."
Martel, who prior to his current position led AI projects at LinkedIn, Dropbox, and Lyft, among others, emphasized that while large language models like ChatGPT can generate text remarkably well, this ability does not imply true intelligence. Their output often contained factual inaccuracies and irrational conclusions, and the models were not trained to think logically.
While there are many useful use cases, such as coding, and his team is just starting a project to systematically look for useful applications, Martel says that humans often tend to equate linguistic fluency with rationality. This could lead us to think that language models are smarter than they are - and then possibly overestimate their capabilities. He argued that we should be wary of this tendency and not anthropomorphize chatbots.
Large language model hype is dangerous
Martell warned of the dangers of irresponsibly introducing AI into the military. He also pointed out the high cognitive load on humans to manually check language models for errors.
He therefore argued for the development of reliable mechanisms to automatically validate output before it is used. This call was part of his core message: language models need a culture of responsible development without hype.
AI community needs to develop acceptability conditions
He said the AI community needs to develop standards to validate the safety of models in different contexts, including clearly defined acceptance conditions similar to those that exist for autonomous driving.
Language models have great scientific potential, he said, but are not yet finished products. Before they can be used in sensitive areas, he said, it is important to explore their limitations and make them reliable. He sees the hacker community as an important partner in this effort: "I'm here today because I need hackers everywhere to tell us how this stuff breaks. Because if we don't know how it breaks, we can't get clear on the acceptability conditions. And if we can't get clear on the acceptability conditions, we can't push the industry towards building the right thing so that we can deploy it and use it."