AI Audit vs Code4rena veRWA: 8/8 HIGH Reproduced

Vote-escrow governance is a specific kind of nightmare to audit. Locks, epochs, weight decay, delegation, gauge systems — the bugs live in the interaction between time and state, not in individual functions. veRWA's Code4rena 2023-08 contest surfaced eight HIGH-severity findings across 754 lines of Solidity. Our AI engine ran against the same codebase and detected every one.

ℹ️Scope and Ground Truth

Comparison is against the official Code4rena 2023-08-verwa report (8 HIGH and 3 MEDIUM findings as published). The "additional" division-by-zero issue described below was not surfaced as a HIGH or MEDIUM in the official report — it may exist in the QA / Low-severity stream, which we did not exhaustively cross-check.

The Results

100%

HIGH Detection (8/8)

6/8

Rated HIGH by our calibrator

754

Lines of Solidity

Total Findings in Report

This is the kind of protocol where pattern-matching scanners produce noise without signal. You need an engine that reasons about temporal state, role hierarchies, and adversarial incentives. That is what we built.

The Target: veRWA Governance

veRWA is a Curve-style vote-escrow governance system adapted for real-world asset gauges. It consists of four contracts tightly woven together:

Governance Surface

VotingEscrow

Users lock tokens for time-weighted voting power. Lock duration drives voting weight via linear decay.

GaugeController

Weekly weight snapshots per gauge. Voters direct their voting power. Admins can add or remove gauges.

LendingLedger

Epoch-based reward accrual for depositors into whitelisted markets. Depends on VotingEscrow balances at epoch boundaries.

Delegation

Voters can delegate their voting power. The delegation unwind path is where the worst bugs lived.

Every HIGH-Severity Finding

C4 HIGH Finding

•H-01: Weekly reward theft via point-in-time balance snapshots
•H-02: Vote multiplication via delegation (GaugeController votes not invalidated)
•H-03: Adding a gauge without admin init loses all voting power
•H-04: Delegated votes locked when owner lock expires
•H-05: DoS on all gauge functions via slope underflow
•H-06: Forced long lock times to undelegate back to self
•H-07: Missing access control in LendingLedger checkpoint functions
•H-08: Gauge removal permanently locks user voting power

Our Detection

•DETECTED · HIGH · PoC verified
•DETECTED · HIGH · PoC verified
•DETECTED · HIGH · PoC verified
•DETECTED · HIGH · PoC verified
•DETECTED · HIGH
•DETECTED · HIGH
•DETECTED · Medium (severity calibrated)
•DETECTED · Medium (severity calibrated)

Every one of C4's HIGH-severity findings was caught. Six of the eight were graded HIGH by our severity calibrator; two were graded Medium.

On the two Medium-graded HIGHs

H-07 (public checkpoint griefing) and H-08 (removed-gauge power lock) are real bugs with real user impact, but neither drains funds by itself. Our calibrator treats findings that require additional user action to exploit, or that result in disenfranchisement rather than direct theft, as Medium by default. C4's wardens rated them HIGH. Both views are defensible; we lean stricter.

The important point: the bugs were detected. A reader can see the full technical description, the attack path, and a concrete fix in the report — regardless of the numeric rating.

An Additional HIGH-Severity Issue Not in the Contest's HIGH/MEDIUM Report

Our engine identified a division-by-zero in LendingLedger.claim() that permanently DoS's reward claims when any epoch in the claim range has zero total market balance. This is reachable any time a market briefly empties out — all lenders withdrawing simultaneously, or a deposit-then-withdraw within the same epoch. Once reached, the user cannot claim any further rewards without forfeiting everything up to the zero-balance epoch.

We ship a passing Foundry PoC for it. We have not exhaustively checked C4's QA / Low-severity submissions; if a warden flagged this at a lower severity it would not appear in the HIGH/MEDIUM report we benchmarked against.

The New Detection Stack

This benchmark runs on our current production pipeline:

Multi-tool static analysis — Slither + Mythril + Aderyn run in parallel for each audit. Each catches a different class of bug; aggregating them eliminates single-tool blind spots.
Protocol-class specialist checklists — The engine identifies the protocol family (vote-escrow, lending, AMM, oracle consumer) from the code and auto-injects the 12-section specialist checklist of well-known footguns for that family. veRWA triggered the vote-escrow, epoch-reward, and lending checklists together.
Attacker-persona reasoning — A dedicated agent reframes the codebase from the attacker's side: "I have $1M of capital; where does the money go?" Catches motivation-driven bugs that defensive scanners miss.
Multi-step flow analysis — Separate agent dedicated to finding bugs that require 2-4 function calls to trigger. This is how we detect the delegation deadlock and the lock-extension grief.
Cross-check ensemble — Every HIGH and CRITICAL finding is reviewed by a second model to independently validate the attack path. Report 5 saw three findings independently confirmed with 0.95 confidence.
Proof-of-concept verification — Foundry test harnesses are auto-generated for every HIGH finding, compiled, and executed. If the exploit doesn't reproduce, the finding is flagged for human review before publication.

The Honest Comparison

What We Caught

•Vote multiplication via delegation (bug class identified)
•Permanently-locked delegated funds (PoC verified)
•Slope-underflow DoS on all gauge functions
•Epoch-boundary reward-theft via balance snapshots
•Division-by-zero in claim() — additional HIGH not in C4's HIGH/MEDIUM report

What We Rated Differently

•Removed-gauge power lock: HIGH → Medium
•Public checkpoint griefing: HIGH → Medium
•
•
•

What This Means in Practice

Our engine now matches Code4rena's published HIGH coverage on governance protocols and surfaces additional issues beyond the contest's HIGH/MEDIUM report. For a team shipping governance code, this is the difference between a one-shot audit at launch and continuous security monitoring on every commit.

⚠️Context on Our Severity Ratings

Our severity calibrator is deliberately stricter than the typical audit-contest scale. A vulnerability that disenfranchises a user but does not drain funds is a Medium under our rubric; several Code4rena HIGH findings land as Medium for this reason. The bugs themselves are described, exploitable, and delivered in the report. Customers who want the C4-style rating can apply their own mapping — the technical content carries over one-to-one.

Try It Yourself

100%

HIGH Detection Rate

8/8

Code4rena HIGH findings caught

PoC

Verified on every critical path

754

SLOC Audited

Governance protocols are where bugs cost the most. We published our detection rate against a real, scored benchmark. Ask your current auditor for theirs.

ℹ️Want to Verify These Findings Yourself?

Have one of our security experts run an independent review on the veRWA codebase or any of the additional findings above — full audit output, side-by-side comparison against the C4 report, and verifier sign-off on each PoC. Request an expert proof report →