Before we benchmark against real-world protocols, we need to know our engine handles the fundamentals. Ethernaut and Damn Vulnerable DeFi are the gold standard training grounds for smart contract security — used by thousands of auditors and wardens worldwide to sharpen their skills.
We ran our AI audit engine against 7 of these battle-tested challenges. It solved all 7. With proof-of-concept exploits for each.
The Scorecard
7/7
Challenges Solved
100%
Detection Rate
100%
PoC Generated
7
Vulnerability Classes Covered
This is not cherry-picking easy challenges. These are the vulnerability classes that cause hundreds of millions in real-world losses every year — reentrancy, flash loan attacks, share inflation, gas exhaustion, and access control failures.
The Challenges
Challenge
- •Ethernaut L10: Reentrance
- •Damn Vulnerable DeFi: UnstoppableVault
- •Damn Vulnerable DeFi: SideEntranceLendingPool
- •Damn Vulnerable DeFi: TheRewarder
- •Ethernaut: Denial
- •Custom: UnsafeVault
- •Custom: FragilePool
Vulnerability Class
- •Reentrancy — recursive withdrawal drain
- •Flash Loan DoS — balance invariant manipulation
- •Flash Loan Callback — deposit trick drain
- •Reward Manipulation — flash loan timing attack
- •Gas Exhaustion DoS — unbounded call consumption
- •Missing Access Control — public setOwner and destroyContract
- •Share Inflation — first depositor donation attack
Why These Challenges Matter
Each of these vulnerability classes has caused real-world losses in production DeFi protocols:
Real-World Impact by Vulnerability Class
Reentrancy ($150M+ stolen)
From The DAO in 2016 to Curve Finance in 2023. The most iconic smart contract vulnerability — and one of the most frequently missed by automated tools that rely on simple pattern matching.
Flash Loan Attacks ($400M+ stolen)
bZx, Harvest Finance, Pancake Bunny, Cream Finance. Flash loans enable zero-capital attacks that exploit economic assumptions. Detecting them requires understanding DeFi composability.
Share Inflation / Donation ($50M+)
ERC4626 vault exploits where the first depositor manipulates the share price. Subtle, mathematical, and invisible to scanners that do not reason about token economics.
Access Control ($300M+)
Wormhole, Ronin Bridge, Parity Wallet. Missing access control checks on critical functions are deceptively simple to exploit but surprisingly common in production code.
How Our Engine Approaches Each Challenge
Unlike traditional tools that scan for known patterns, our 6-agent team reasons about each contract from first principles:
Reentrancy (Ethernaut L10)
Sentinel maps the call graph
Identifies that withdraw() makes an external call to msg.sender before updating balances — the classic checks-effects-interactions violation.
Viper flags the reentrancy vector
Detects that the state update happens after the external call, enabling recursive re-entry.
Forge writes the exploit
Generates an attacker contract with a receive() function that calls withdraw() recursively, draining the entire contract balance.
Flash Loan DoS (UnstoppableVault)
Sentinel identifies the invariant
Maps the totalSupply === totalAssets equality check in the flashLoan function — the protocol assumes these always match.
Phantom breaks the invariant
Realizes that a direct token transfer (not through deposit) breaks the invariant permanently, disabling all future flash loans.
Forge proves the DoS
Writes a test that transfers 1 token directly, then shows flashLoan() reverts for all subsequent callers.
Share Inflation (FragilePool)
Viper spots the math
Identifies that the first depositor can manipulate the share-to-asset ratio by donating tokens directly to the contract before other users deposit.
Phantom calculates the attack
Determines the exact donation amount needed to make subsequent depositors receive 0 shares (integer division rounding).
Forge demonstrates the theft
Writes a test showing: deposit 1 wei, donate 10000 tokens, next depositor of 9999 tokens gets 0 shares. Attacker withdraws everything.
What This Proves
💡Foundational Coverage
Before you trust an auditor with your million-dollar protocol, you should know whether they can catch the basics. These 7 challenges represent the foundational vulnerability classes that every auditor — human or AI — must handle reliably. A 100% detection rate on these challenges is the minimum bar, not the ceiling. That is why we start here and then benchmark against real-world protocols like VTVL and Wildcat.
The Bigger Picture
This benchmark is part of a 7-stage quality assurance pipeline:
Unit Tests (Stages 1-3)
Individual agent capabilities — can Sentinel map a call graph? Can Viper detect reentrancy? Can Forge write a compiling test?
CTF Challenges (This Post)
7 Ethernaut and Damn Vulnerable DeFi challenges covering the core vulnerability classes.
Real C4 Contests
VTVL Vesting (100% detection) and Wildcat Protocol (100% high detection, 90.3% overall) — real-world protocol complexity.
Regression Tracking
Every release is benchmarked against all previous results. If detection drops, the release does not ship.
The Standard We Set
Other tools in the space test against a handful of contracts and declare victory. We test against CTF challenges, real Code4rena contests, and multi-contract protocols — and we publish every result.
Because we believe security is not about claims. It is about evidence.
7/7
Challenges Solved
100%
Perfect Score
7
Vulnerability Classes
0
Missed Findings
The fundamentals are covered. The foundations are solid. Now see what happens when we scale to real-world protocols.
Read our Code4rena benchmark results: VTVL Vesting (100% Detection) | Wildcat Protocol (100% High Detection)