OpenAI pits AI agents against each other to red team smart contracts

• OpenAI has launched a new benchmark that evaluates how well different AI models detect, patch, and even exploit security vulnerabilities found in crypto smart contracts. • OpenAI released the “EVMbench: Evaluating AI Agents on Smart Contract Security” paper on Wednesday, in collaboration with crypto investment firm Paradigm and crypto security firm OtterSec, to evaluate how much the AI agents could theoretically exploit from 120 smart contract vulnerabilities. • Anthropic’s Claude Opus 4.6 came out on top with an average “detect award” of $37,824, followed by OpenAI’s OC-GPT-5.2 and Google’s Gemini 3 Pro at $31,623 and $25,112, respectively. • While AI agents are becoming increasingly efficient at handling basic tasks, OpenAI said it is becoming more important to evaluate their performance in “economically meaningful environments.” “Smart contracts secure billions of dollars in assets, and AI agents are likely to be transformative for both attackers and defenders.” “We expect agentic stablecoin payments to grow, and help ground it in a domain of emerging practical importance,” OpenAI added. • Circle CEO Jeremy Allaire predicted on Jan. • 22 that billions of AI agents will be transacting with stablecoins for everyday payments on behalf of users within five years, while former Binance boss Changpeng “CZ” Zhao also recently tipped that crypto would end up being the “native currency for AI agents.” The need to test agentic AI performance in spotting security vulnerabilities comes as attack

Sources:

https://cointelegraph.com/news/openai-benchmark-ai-agents-detect-smart-contract-flaws?utm_source=rss_feed&utm_medium=rss&utm_campaign=rss_partner_inbound