We run an AI-first audit engine. It catches the majority of security findings faster than any human reviewer can — and our benchmark against Code4rena contests has repeatedly detected 100% of high-severity findings in under ten minutes. That's not marketing; the numbers are public and reproducible.
But if you read our reports carefully, you'll notice that every engagement beyond a certain complexity includes a human expert review phase on top of the AI pass. This isn't because we don't trust our own tooling. It's because there's a specific class of vulnerability that AI — ours or anyone else's — still doesn't find consistently. And that class is where the biggest losses happen.
Here are three cases from the last six months where the human reviewer found something the AI did not. Names and some details are anonymized, but the bugs are real and the losses would have been.
Case 1: The Governance Proposal That Voted Against Itself
Protocol type: DAO-governed lending market, ~$400M TVL AI audit result: 6 findings (2 high, 3 medium, 1 low). All verified with Foundry PoCs. Report clean. Human review added: 1 critical finding.
The AI did its job. It found a reentrancy guard missing on one function, it found two access-control bugs, it correctly flagged an oracle manipulation vector. Every finding was reproducible, every PoC passed, and the severity classifications matched what a senior auditor would have assigned.
What the AI missed was the governance attack.
The protocol allowed any token holder to submit a governance proposal. Proposals had a 48-hour voting window followed by a 24-hour timelock before execution. The AI correctly identified that there were no direct permission escalations in this flow — no way for an attacker to bypass voting, no way to execute without approval.
What the AI didn't reason about was the economic structure of the voting system. The protocol's governance token was also the primary lending collateral. A proposal could, during the timelock period, call a function that drained the protocol's reserves — and the same proposal, if it passed, would also modify the oracle that priced the governance token itself.
The attack the human reviewer identified:
Borrow governance tokens using low-priority collateral
Take a large loan using a less-liquid asset as collateral, receive governance tokens as the borrow
Submit malicious governance proposal
Use the borrowed tokens to propose a change that re-prices the governance token to near-zero
Wait for the vote
The proposal passes because the attacker now holds enough voting power
Timelock expires, proposal executes
Governance token drops to zero, the attacker's position becomes massively overcollateralized
Drain reserves via normal borrow
Using the now-cheap governance token to borrow the entire protocol treasury, then default on the original loan because the collateral is now worthless
This is a six-step attack spanning three separate contracts and requiring the attacker to reason about the economic feedback loop between governance, collateralization, and oracle pricing. No AI we've tested — including our own — identifies this class of multi-step economic exploit reliably. Our Phantom agent generates related patterns, but the specific chain requires a human to see the loop.
The protocol fixed the issue by adding a "governance-critical" flag to the oracle contract that prevents governance proposals from mutating it during a timelock window. Small fix. Huge consequence if it had shipped without the human review.
Case 2: The Invariant That Was "Obviously" True
Protocol type: Cross-chain DEX with intents-based settlement, pre-launch AI audit result: 4 findings (1 high, 2 medium, 1 low). All verified. Human review added: 1 high finding by invariant analysis.
This one came from an auditor who has a habit of reading protocol documentation and asking "is this claim actually enforced by the code?"
The DEX's documentation stated that once a user's intent was posted, it could be filled by any solver, but the solver had to provide at least the minimum output amount specified by the intent. Read literally, this is trivially true — the settlement contract checks the output amount before transferring funds.
The human reviewer noticed a different invariant that the docs implied but didn't state: the solver can only fill an intent once. If the solver could partially fill, then re-fill, then re-fill again — each time claiming the fee — the fee paid by the user could exceed the total intent amount.
The AI had verified that outputs matched minimums. It hadn't verified that the sum of fees across all fills was bounded. Because that invariant wasn't written down anywhere — it was implicit in the documentation's one-fill-per-intent mental model.
The fix was a single line: require(intent.filled == 0) at the top of the fill function. The finding was a high-severity because the economic loss was unbounded per intent. And no AI we tested flagged it, because the invariant the attack violated was one that a human had to infer from reading the docs and the code side by side.
Case 3: The Access Control That Only Worked Under Happy Paths
Protocol type: Restaking protocol, ~$1.2B TVL at audit time AI audit result: 11 findings (3 high, 5 medium, 3 low). Extremely thorough; no false negatives on standard patterns. Human review added: 1 critical finding in a "safe" helper library.
The protocol used a widely-adopted access control library. The AI correctly verified that every external function had an onlyRole check, every role was granted via a proper process, and no role was granted without timelock. Textbook.
The human reviewer checked something the AI didn't: what happens if the role granting process itself fails partway through?
The specific path: when a new operator was added, the protocol granted them three roles sequentially — DEPOSITOR, WITHDRAWER, and SLASHER. These grants happened in a single function, in that order. If the third grant reverted (for any reason — out of gas, a storage collision in an unrelated contract, a malicious fallback), the operator would end up with DEPOSITOR and WITHDRAWER roles but not SLASHER.
The problem: the protocol's "remove operator" function required SLASHER permission to be revoked as part of the removal. If the operator never had SLASHER, the removal function reverted. The operator could not be removed by normal means. They had DEPOSITOR and WITHDRAWER access to the protocol in perpetuity.
This finding came from the reviewer spending two hours tracing a state diagram on paper, asking: "what are all the partial states this system can be in?" The AI had verified the happy path and each individual function. It hadn't modeled the cartesian product of role states, which is where the bug lived.
ℹ️Why this keeps happening
Every current LLM — including the ones we use internally — reasons about functions in isolation and in sequence. It doesn't construct state machines where the question is "what invalid states are reachable?" The best security researchers do exactly that, and they find bugs AI misses because the bugs aren't in the code — they're in the space between the code.
What This Means for Your Audit Strategy
If your protocol is:
- A simple token (ERC-20, ERC-721, ERC-1155)
- A straightforward vault with well-understood mechanics
- A smart wallet without unusual session-key or paymaster logic
Then an AI-first audit is probably sufficient. The common bug patterns are well-covered, the cost is low, and the turnaround is measured in hours. You can run audits every CI push. That's a strict improvement over the status quo.
If your protocol is:
- Governance-enabled with economic feedback loops
- Cross-chain with novel message types or settlement flows
- Dependent on invariants that only show up in documentation or whitepapers
- Built on top of multiple composable systems where the interactions are themselves the product
Then you need a human reviewer in the loop. Not because AI is failing — AI is catching the vast majority of findings, fast. But the highest-severity findings are increasingly in the 10% that require reasoning an AI doesn't do yet.
How RedVolt's Expert Review Works
For protocols that need the last 10%, we offer a hybrid audit structure:
AI Pass (hours)
Our multi-agent audit engine runs a complete pass — pattern matching, invariant checking, Forge PoC generation for every finding
Expert Review (1-3 weeks)
A dedicated senior auditor, matched to your protocol's domain (DeFi / bridges / restaking / AA), reviews the AI output and extends with manual analysis
Joint Report
The final report separates AI-verified findings from human-added findings, so you know exactly what the AI caught and what required expert judgment
Retest Included
After you ship fixes, the AI re-runs automatically and the expert retests the human-found findings at no additional cost
The key detail: one expert per engagement, not a pool. Your reviewer reads your code, builds a mental model of your protocol, and stays on the project through retest. They're accountable for the 10% the AI doesn't catch.
If you're launching something that needs both speed and depth, request an expert review. We'll match you to the auditor whose past work most resembles what you're building.