Back to Blog
BenchmarkSmart ContractsWeb3

100% High Detection on a 2,300-Line Protocol: Wildcat Benchmark Results

February 24, 20265 min readRedVolt Team

Finding bugs in a 50-line contract is one thing. Finding them in a real-world lending protocol with 22 interconnected Solidity files and 2,332 lines of code is something else entirely. We ran our AI audit engine against the Wildcat Protocol — one of the most complex Code4rena contests ever held — and the results speak for themselves.

The Results

100%

High Detection (6/6)

90.3%

Overall Detection Rate

11 min

Total Audit Time

144

Human Wardens in C4 Contest

The original Code4rena 2023-10-wildcat contest attracted 144 experienced security wardens. The contest ran for days. Our AI engine completed the same audit in 11 minutes — and caught every high-severity finding the human wardens found.

The Target: Wildcat Protocol

Wildcat is a credit market protocol that enables undercollateralized on-chain lending. It is architecturally complex:

Protocol Complexity

22 Solidity files

Controllers, markets, market factories, archive contracts, lens contracts, and supporting libraries — all interconnected with shared state.

2,332 lines of code

Significant codebase with complex business logic around market creation, lending, borrowing, withdrawals, and sanctions compliance.

Multiple attack surfaces

Fee calculations, CREATE2 deployment, batch withdrawal mechanics, sanctions oracle integration, and market lifecycle management.

Cross-contract interactions

Vulnerabilities span multiple contracts. You cannot find them by reading one file at a time — you need to understand the full call graph and state flow.

This is exactly the kind of protocol where traditional automated tools fail. Static analyzers flag false positives on every file. Simple pattern matchers miss the logic bugs entirely. You need deep reasoning about protocol semantics — and that is what our multi-agent AI architecture delivers.

Every High-Severity Finding: Detected

C4 High Finding

  • H-01: Fee calculation escape during market close
  • H-02: CREATE2 codehash bypass in market deployment
  • H-03: Missing maxTotalSupply enforcement and closeMarket exposure
  • H-04: Zero withdrawalBatchDuration race condition
  • H-05: Sanctions evasion via token transfer to clean address
  • H-06: Borrower draining sanctioned lender funds

Detection

  • DETECTED
  • DETECTED
  • DETECTED
  • DETECTED
  • DETECTED
  • DETECTED

Six high-severity findings. Six detections. No misses.

Medium-Severity Coverage

8/10

Medium Findings Detected

90.3%

Combined Detection Rate

18

Total Findings Reported

53

Raw Agent Findings

Our engine detected 8 of the 10 medium-severity findings — for a combined high+medium detection rate of over 90%. The 53 raw findings from individual agents were deduplicated and prioritized into 18 final findings by the Scribe report synthesizer.

Why This Benchmark Is Different

Most AI security tools benchmark against simple, well-known vulnerability patterns — reentrancy, integer overflow, access control on single functions. Wildcat is fundamentally different:

Typical AI Benchmarks

  • Single-contract targets
  • Known patterns (reentrancy, overflow)
  • 50-200 lines of code
  • Synthetic/CTF targets
  • No ground truth comparison

Wildcat Benchmark

  • 22 interconnected contracts
  • Novel logic bugs (fee escape, sanctions evasion)
  • 2,332 lines of code
  • Real protocol from Code4rena contest
  • Scored against 144-warden contest results

Multi-Agent Coordination at Scale

At this scale, a single agent approach falls apart. You need specialized agents that each bring domain expertise:

01

Sentinel maps 142 functions

Across 20 contracts, building a complete call graph, token flow analysis, and role hierarchy. This structural understanding is critical for cross-contract bug detection.

02

Viper hunts logic bugs

Identifies the fee calculation escape in market close (H-01) and the withdrawal batch race condition (H-04) — bugs that require understanding temporal state transitions.

03

Warden audits access control

Catches the CREATE2 codehash bypass (H-02) and the missing maxTotalSupply enforcement (H-03) — architectural flaws in the deployment and governance layer.

04

Phantom finds economic exploits

Discovers the sanctions evasion via token transfer (H-05) and the borrower drain attack (H-06) — multi-step exploits that require adversarial economic reasoning.

05

Scribe synthesizes the report

Deduplicates 53 raw agent findings into 18 final findings, assigns severity ratings, and generates the professional PDF audit report.

Each agent ran for approximately 3 minutes. The entire audit completed in 11 minutes.

What Competing with 144 Wardens Means

⚠️Putting It in Perspective

In the original Code4rena contest, 144 experienced security wardens competed over several days to find vulnerabilities in Wildcat Protocol. Our AI engine matched their high-severity detection rate in 11 minutes. This is not about replacing human auditors — it is about establishing a measurable performance baseline that the industry has never had before.

We believe every audit — human or AI — should be held to measurable standards. Not vague claims of "thorough review" or "comprehensive analysis." Actual detection rates against known ground truth. That is the standard we hold ourselves to, and we challenge every other auditor to do the same.

The Challenge

If your audit provider — whether human or AI — cannot tell you their detection rate against standardized benchmarks, you should ask why. We publish ours because we have nothing to hide.

100%

High Severity Found

90.3%

Overall Detection

11 min

Completed In

$0

Missed Critical Bugs

The numbers do not lie. And we will keep publishing them.

Start a Free Smart Contract Audit

Want to secure your application or smart contract?

Request an Expert Review