GPT-5 and Claude demonstrate multi-million dollar vulnerability

Table of Contents

AI agents can now exploit smart contracts on Ethereum and other blockchains, raising urgent questions about the economic risks of autonomous cyber capabilities.

summary

Frontier AI models such as GPT-5 and Claude leveraged smart contracts on Ethereum and other blockchains in mock tests.
The AI model discovered previously unknown security flaws (known as zero-day vulnerabilities) in software, in this case smart contracts on Ethereum.
The findings highlight the urgent need for AI-powered proactive defense strategies, with AI agents now rivaling human hackers in identifying profitable blockchain exploits.

A joint project between Anthropic and MATS Fellows used the newly created Smart CONtracts Exploitation Benchmark (SCONE-bench) to test an AI model against 405 real-world contracts that were exploited between 2020 and 2025.

In simulated attacks against contracts exploited from March 2025 onward, Claude Opus 4.5, Claude Sonnet 4.5, and GPT-5 collectively generated $4.6 million worth of exploits, demonstrating a concrete lower bound on the potential economic damage that AI can cause. When we expanded our testing to 2,849 recently deployed contracts with no known vulnerabilities, GPT-5 and Sonnet 4.5 discovered two new zero-day vulnerabilities, generating a simulated profit of nearly $3,700.

SCONE Bench: Quantifying exploits in dollars, not bugs

While traditional cybersecurity benchmarks measure success by detection rates or arbitrary scores, the SCONE bench evaluates AI exploits from a monetary perspective, providing a more concrete measure of risk. Smart contracts are particularly suited to this approach, as vulnerabilities can lead directly to the theft of funds, and simulations allow researchers to quantify potential losses.

Across all 405 contracts on the SCONE bench, 10 AI models generated exploits for 207 contracts, with a total of $550.1 million in funds simulated. Even considering potential data contamination, the frontier model consistently demonstrated the ability to exploit contracts beyond knowledge deadlines.

Examples of AI exploits

One of the vulnerabilities tested involved the token calculation function of the Ethereum compatible contract, which was inadvertently left writable. The AI agent repeatedly called a function that inflated the token balance, generating a pseudo profit. $2,500 And in situations where liquidity has reached its peak, $19,000. The assets were subsequently recovered through independent white hat intervention.

This study highlights that AI agents are now approaching human-level capabilities in tasks such as control flow inference, boundary analysis, and software vulnerability exploitation. This skillset is directly applicable to both blockchain and traditional software systems.

The study highlights the rapid acceleration of AI’s cyber capabilities, from network intrusions to autonomous exploitation of blockchain applications. SCONE-bench provides defensive tools and allows smart contract developers to stress test their systems before deployment.

According to the researchers, the findings are a proof of concept that profitable real-world autonomous exploitation is possible and highlight the urgent need for AI-powered proactive defenses to protect financial systems and digital assets.

read more: Grayscale: Bitcoin’s decline is ‘typical’, new high expected in 2026