When Human Auditors Find What AI Misses: Three Real Cases From RedVolt Engagements

This post describes RedVolt's optional Expert Review tier — an add-on for clients who want a senior auditor to review their AI audit findings. The default RedVolt audit is fully autonomous AI. Expert Review has custom pricing — see /expert-review.

We run an autonomous AI audit engine. It catches the majority of security findings faster than any human reviewer can — and our benchmarks against Code4rena contests have repeatedly detected 100% of high-severity findings in under ten minutes. That's not marketing; the numbers are public and reproducible.

For clients who add the optional Expert Review tier, a senior human auditor reviews the AI findings on top. The human reviewer occasionally catches a specific class of vulnerability — typically business-logic flaws — that AI (ours or anyone else's) still doesn't find consistently. That class is where the biggest losses happen, which is why some teams choose to layer Expert Review on top of the AI audit.

Here are three cases from the last six months where the human reviewer found something the AI did not. Names and some details are anonymized, but the bugs are real and the losses would have been.

Case 1: The Governance Proposal That Voted Against Itself

Protocol type: DAO-governed lending market, ~$400M TVL AI audit result: 6 findings (2 high, 3 medium, 1 low). All verified with Foundry PoCs. Report clean. Human review added: 1 critical finding.

The AI did its job. It found a reentrancy guard missing on one function, it found two access-control bugs, it correctly flagged an oracle manipulation vector. Every finding was reproducible, every PoC passed, and the severity classifications matched what a senior auditor would have assigned.

What the AI missed was the governance attack.

The protocol allowed any token holder to submit a governance proposal. Proposals had a 48-hour voting window followed by a 24-hour timelock before execution. The AI correctly identified that there were no direct permission escalations in this flow — no way for an attacker to bypass voting, no way to execute without approval.

What the AI didn't reason about was the economic structure of the voting system. The protocol's governance token was also the primary lending collateral. A proposal could, during the timelock period, call a function that drained the protocol's reserves — and the same proposal, if it passed, would also modify the oracle that priced the governance token itself.

The attack the human reviewer identified:

Borrow governance tokens using low-priority collateral

Take a large loan using a less-liquid asset as collateral, receive governance tokens as the borrow

Submit malicious governance proposal

Use the borrowed tokens to propose a change that re-prices the governance token to near-zero

Wait for the vote

The proposal passes because the attacker now holds enough voting power

Timelock expires, proposal executes

Governance token drops to zero, the attacker's position becomes massively overcollateralized

Drain reserves via normal borrow

Using the now-cheap governance token to borrow the entire protocol treasury, then default on the original loan because the collateral is now worthless

This is a six-step attack spanning three separate contracts and requiring the attacker to reason about the economic feedback loop between governance, collateralization, and oracle pricing. No AI we've tested — including our own — identifies this class of multi-step economic exploit reliably. Our Phantom agent generates related patterns, but the specific chain requires a human to see the loop.

The protocol fixed the issue by adding a "governance-critical" flag to the oracle contract that prevents governance proposals from mutating it during a timelock window. Small fix. Huge consequence if it had shipped without the human review.

Case 2: The Invariant That Was "Obviously" True

Protocol type: Cross-chain DEX with intents-based settlement, pre-launch AI audit result: 4 findings (1 high, 2 medium, 1 low). All verified. Human review added: 1 high finding by invariant analysis.

This one came from an auditor who has a habit of reading protocol documentation and asking "is this claim actually enforced by the code?"

The DEX's documentation stated that once a user's intent was posted, it could be filled by any solver, but the solver had to provide at least the minimum output amount specified by the intent. Read literally, this is trivially true — the settlement contract checks the output amount before transferring funds.

The human reviewer noticed a different invariant that the docs implied but didn't state: the solver can only fill an intent once. If the solver could partially fill, then re-fill, then re-fill again — each time claiming the fee — the fee paid by the user could exceed the total intent amount.

The AI had verified that outputs matched minimums. It hadn't verified that the sum of fees across all fills was bounded. Because that invariant wasn't written down anywhere — it was implicit in the documentation's one-fill-per-intent mental model.

The fix was a single line: require(intent.filled == 0) at the top of the fill function. The finding was a high-severity because the economic loss was unbounded per intent. And no AI we tested flagged it, because the invariant the attack violated was one that a human had to infer from reading the docs and the code side by side.

Case 3: The Access Control That Only Worked Under Happy Paths

Protocol type: Restaking protocol, ~$1.2B TVL at audit time AI audit result: 11 findings (3 high, 5 medium, 3 low). Extremely thorough; no false negatives on standard patterns. Human review added: 1 critical finding in a "safe" helper library.

The protocol used a widely-adopted access control library. The AI correctly verified that every external function had an onlyRole check, every role was granted via a proper process, and no role was granted without timelock. Textbook.

The human reviewer checked something the AI didn't: what happens if the role granting process itself fails partway through?

The specific path: when a new operator was added, the protocol granted them three roles sequentially — DEPOSITOR, WITHDRAWER, and SLASHER. These grants happened in a single function, in that order. If the third grant reverted (for any reason — out of gas, a storage collision in an unrelated contract, a malicious fallback), the operator would end up with DEPOSITOR and WITHDRAWER roles but not SLASHER.

The problem: the protocol's "remove operator" function required SLASHER permission to be revoked as part of the removal. If the operator never had SLASHER, the removal function reverted. The operator could not be removed by normal means. They had DEPOSITOR and WITHDRAWER access to the protocol in perpetuity.

This finding came from the reviewer spending two hours tracing a state diagram on paper, asking: "what are all the partial states this system can be in?" The AI had verified the happy path and each individual function. It hadn't modeled the cartesian product of role states, which is where the bug lived.

ℹ️Why this keeps happening

Every current LLM — including the ones we use internally — reasons about functions in isolation and in sequence. It doesn't construct state machines where the question is "what invalid states are reachable?" The best security researchers do exactly that, and they find bugs AI misses because the bugs aren't in the code — they're in the space between the code.

What This Means for Your Audit Strategy

If your protocol is:

A simple token (ERC-20, ERC-721, ERC-1155)
A straightforward vault with well-understood mechanics
A smart wallet without unusual session-key or paymaster logic

Then an AI-first audit is probably sufficient. The common bug patterns are well-covered, the cost is low, and the turnaround is measured in hours. You can run audits every CI push. That's a strict improvement over the status quo.

If your protocol is:

Governance-enabled with economic feedback loops
Cross-chain with novel message types or settlement flows
Dependent on invariants that only show up in documentation or whitepapers
Built on top of multiple composable systems where the interactions are themselves the product

Then you need a human reviewer in the loop. Not because AI is failing — AI is catching the vast majority of findings, fast. But the highest-severity findings are increasingly in the 10% that require reasoning an AI doesn't do yet.

How RedVolt's Expert Review Works

For protocols that need the last 10%, we offer a hybrid audit structure:

AI Pass (hours)

Our multi-agent audit engine runs a complete pass — pattern matching, invariant checking, Forge PoC generation for every finding

→

↓

Expert Review (1-3 weeks)

A dedicated senior auditor, matched to your protocol's domain (DeFi / bridges / restaking / AA), reviews the AI output and extends with manual analysis

→

↓

Joint Report

The final report separates AI-verified findings from human-added findings, so you know exactly what the AI caught and what required expert judgment

→

↓

Retest Included

After you ship fixes, the AI re-runs automatically and the expert retests the human-found findings at no additional cost

The key detail: one expert per engagement, not a pool. Your reviewer reads your code, builds a mental model of your protocol, and stays on the project through retest. They're accountable for the 10% the AI doesn't catch.

If you're launching something that needs both speed and depth, request an expert review. We'll match you to the auditor whose past work most resembles what you're building.

When Human Auditors Find What AI Misses: Three Real Cases From RedVolt Engagements

Case 1: The Governance Proposal That Voted Against Itself

Case 2: The Invariant That Was "Obviously" True

Case 3: The Access Control That Only Worked Under Happy Paths

What This Means for Your Audit Strategy

How RedVolt's Expert Review Works

Related reading

Inside a RedVolt Expert Review: From Scoping Call to Retest

AI Smart Contract Audits vs Traditional Audit Firms: An Honest Comparison

AI Audit on Karak Restaking: 3 Additional HIGH Findings Beyond the Contest Report