New benchmark shows AI agents can exploit most smart contract vulnerabilities on their own
OpenAI and crypto investment firm Paradigm have built EVMbench, a benchmark that measures how well AI agents can find, fix, and exploit security vulnerabilities in Ethereum smart contracts. The dataset covers 120 vulnerabilities drawn from 40 real-world security audits.
In the most realistic test setup, AI agents interact with a local blockchain and have to carry out attacks entirely on their own.
The top-performing model, GPT-5.3-Codex, successfully exploited 72 percent of the vulnerabilities and fixed 41.5 percent. For detection, Claude Opus 4.6 came out ahead at 45.6 percent.
The biggest challenge for the AI agents isn't exploiting or fixing vulnerabilities - it's finding them in large codebases, the researchers say. When agents were given hints about where a vulnerability was located, exploit success rates jumped from 63 to 96 percent, and fix rates climbed from 39 to 94 percent.
With over $100 billion locked in smart contracts, the authors see both an opportunity for better security and a growing risk if these capabilities fall into the wrong hands.
AI News Without the Hype – Curated by Humans
As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.
Subscribe now