AI Audit vs Ethernaut + DVD: 7/7 Perfect Score

Before we benchmark against real-world protocols, we need to know our engine handles the fundamentals. Ethernaut and Damn Vulnerable DeFi are the gold standard training grounds for smart contract security — used by thousands of auditors and wardens worldwide to sharpen their skills.

We ran our AI audit engine against 7 of these battle-tested challenges. It solved all 7. With proof-of-concept exploits for each.

ℹ️Scope and Ground Truth

Targets are public CTF challenges from Ethernaut and Damn Vulnerable DeFi. Each challenge has a single canonical solution; "solved" means our engine produced a passing Foundry exploit reproducing the intended attack.

The Scorecard

7/7

Challenges Solved

100%

Detection Rate

100%

PoC Generated

Vulnerability Classes Covered

This is not cherry-picking easy challenges. These are the vulnerability classes that cause hundreds of millions in real-world losses every year — reentrancy, flash loan attacks, share inflation, gas exhaustion, and access control failures.

The Challenges

Challenge

•Ethernaut L10: Reentrance
•Damn Vulnerable DeFi: UnstoppableVault
•Damn Vulnerable DeFi: SideEntranceLendingPool
•Damn Vulnerable DeFi: TheRewarder
•Ethernaut: Denial
•Custom: UnsafeVault
•Custom: FragilePool

Vulnerability Class

•Reentrancy — recursive withdrawal drain
•Flash Loan DoS — balance invariant manipulation
•Flash Loan Callback — deposit trick drain
•Reward Manipulation — flash loan timing attack
•Gas Exhaustion DoS — unbounded call consumption
•Missing Access Control — public setOwner and destroyContract
•Share Inflation — first depositor donation attack

Why These Challenges Matter

Each of these vulnerability classes has caused real-world losses in production DeFi protocols:

Real-World Impact by Vulnerability Class

Reentrancy ($150M+ stolen)

From The DAO in 2016 to Curve Finance in 2023. The most iconic smart contract vulnerability — and one of the most frequently missed by automated tools that rely on simple pattern matching.

Flash Loan Attacks ($400M+ stolen)

bZx, Harvest Finance, Pancake Bunny, Cream Finance. Flash loans enable zero-capital attacks that exploit economic assumptions. Detecting them requires understanding DeFi composability.

Share Inflation / Donation ($50M+)

ERC4626 vault exploits where the first depositor manipulates the share price. Subtle, mathematical, and invisible to scanners that do not reason about token economics.

Access Control ($300M+)

Wormhole, Ronin Bridge, Parity Wallet. Missing access control checks on critical functions are deceptively simple to exploit but surprisingly common in production code.

How Our Engine Approaches Each Challenge

Unlike traditional tools that scan for known patterns, our specialized AI agent team reasons about each contract from first principles:

Reentrancy (Ethernaut L10)

Sentinel maps the call graph

Identifies that withdraw() makes an external call to msg.sender before updating balances — the classic checks-effects-interactions violation.

→

↓

Viper flags the reentrancy vector

Detects that the state update happens after the external call, enabling recursive re-entry.

→

↓

Forge writes the exploit

Generates an attacker contract with a receive() function that calls withdraw() recursively, draining the entire contract balance.

Flash Loan DoS (UnstoppableVault)

Sentinel identifies the invariant

Maps the totalSupply === totalAssets equality check in the flashLoan function — the protocol assumes these always match.

→

↓

Phantom breaks the invariant

Realizes that a direct token transfer (not through deposit) breaks the invariant permanently, disabling all future flash loans.

→

↓

Forge proves the DoS

Writes a test that transfers 1 token directly, then shows flashLoan() reverts for all subsequent callers.

Share Inflation (FragilePool)

Viper spots the math

Identifies that the first depositor can manipulate the share-to-asset ratio by donating tokens directly to the contract before other users deposit.

→

↓

Phantom calculates the attack

Determines the exact donation amount needed to make subsequent depositors receive 0 shares (integer division rounding).

→

↓

Forge demonstrates the theft

Writes a test showing: deposit 1 wei, donate 10000 tokens, next depositor of 9999 tokens gets 0 shares. Attacker withdraws everything.

What This Proves

💡Foundational Coverage

Before you trust an auditor with your million-dollar protocol, you should know whether they can catch the basics. These 7 challenges represent the foundational vulnerability classes that every auditor — human or AI — must handle reliably. Solving every challenge is the minimum bar, not the ceiling. That is why we start here and then benchmark against real-world protocols like VTVL and Wildcat.

The Bigger Picture

This benchmark is part of a 7-stage quality assurance pipeline:

Unit Tests (Stages 1-3)

Individual agent capabilities — can Sentinel map a call graph? Can Viper detect reentrancy? Can Forge write a compiling test?

→

↓

CTF Challenges (This Post)

7 Ethernaut and Damn Vulnerable DeFi challenges covering the core vulnerability classes.

→

↓

Real C4 Contests

VTVL Vesting (5/5 findings reproduced) and Wildcat Protocol (6/6 HIGH + 8/10 MEDIUM, 90.3% overall) — real-world protocol complexity.

→

↓

Regression Tracking

Every release is benchmarked against all previous results. If detection drops, the release does not ship.

The Standard We Set

Other tools in the space test against a handful of contracts and declare victory. We test against CTF challenges, real Code4rena contests, and multi-contract protocols — and we publish every result.

Because we believe security is not about claims. It is about evidence.

7/7

Challenges Solved

100%

Perfect Score

Vulnerability Classes

Missed Findings

The fundamentals are covered. The foundations are solid. Now see what happens when we scale to real-world protocols.

ℹ️Want Independent Verification?

We can pair these CTF results with a manual expert review on your specific deployment — full audit output and verifier sign-off on each PoC. Request an expert proof report →

AI Audit vs Ethernaut + DVD: 7/7 Perfect Score

The Scorecard

The Challenges

Challenge

Vulnerability Class

Why These Challenges Matter

How Our Engine Approaches Each Challenge

Reentrancy (Ethernaut L10)

Flash Loan DoS (UnstoppableVault)

Share Inflation (FragilePool)

What This Proves

The Bigger Picture

The Standard We Set

Related reading

Related reading

AI Audit on Karak Restaking: 3 Additional HIGH Findings Beyond the Contest Report

AI Audit vs Code4rena veRWA: 8/8 HIGH Reproduced

AI Audit vs Code4rena BakerFi: 7/7 HIGH Reproduced