AI smart contract audits: beyond the hype

Everyone is talking about AI audits. Most of what passes for "AI auditing" today is superficial: run an LLM over the source code, collect its observations, package them as findings. This isn't auditing. It's autocomplete with a security vocabulary.

I want to describe what AI-powered auditing actually looks like when you build it on top of a rigorous foundation: invariant testing, coverage-guided fuzzing, and formal methods. This is what we've built at Recon, and it's fundamentally different from what the industry is selling.

The problem with "AI auditing"

The current wave of AI audit tools shares a common flaw: they treat security analysis as a text comprehension problem. Feed the code to a language model, ask it to find vulnerabilities, hope it catches something. This approach has three fatal limitations:

No execution context. A language model reading Solidity doesn't know what the code does, it knows what the code says. It can't reason about state transitions across a sequence of transactions. It can't determine whether a rounding error accumulates over 10,000 operations. It can't simulate a flash loan attack path through three contracts.

No completeness guarantee. There's no way to measure what an LLM missed. It might flag a reentrancy pattern on one function and overlook an identical pattern two functions down. The output is probabilistic, not systematic.

No reproducibility. Run the same LLM on the same code twice and you may get different results. This makes it impossible to verify coverage, compare audit quality, or build confidence that all paths have been reviewed.

These aren't minor limitations. They're the same problems that plague manual auditing, and the entire point of automation should be to solve them, not replicate them faster.

What AI-powered auditing should look like

At Recon, we start from a different premise. AI isn't the auditor. AI is the auditor's infrastructure. It generates the artifacts that make systematic review possible.

Here's the pipeline:

Automated property generation

Recon Magic analyzes the target codebase and generates invariant properties — executable Solidity assertions that express what the protocol must guarantee. Solvency conditions. Accounting consistency. Access control boundaries. Monotonicity constraints on rates and prices.

This isn't pattern matching. The system analyzes the contract's storage layout, its function signatures, its external call graph, and its mathematical operations to produce properties that are specific to the protocol. A lending protocol gets properties about collateralization ratios and liquidation thresholds. A vault gets properties about share-to-asset conversion accuracy. An AMM gets properties about constant product invariance.

The result is a Chimera-based test suite (compatible with Echidna, Medusa, and Foundry) ready to run.

Coverage gap analysis

Properties alone aren't enough. A property suite might cover the protocol's happy path thoroughly while missing entire branches of execution. This is where the AI layer adds its most critical value.

Our system identifies coverage classes (every meaningfully distinct execution path through the code) and cross-references them against the generated properties. Paths that aren't exercised by any property are flagged as gaps. The AI then generates additional properties, handler functions, and test configurations specifically designed to reach those uncovered paths.

This is informed by research on coverage-guided fuzzing. The foundational insight from Grieco et al.'s work on Echidna and Wüstholz and Christakis's Harvey is that coverage feedback transforms random testing into directed exploration. We apply the same principle to property generation itself: the AI doesn't generate properties blindly, it generates them to maximize the coverage of the resulting fuzzing campaign.

Semantic enrichment

Beyond path coverage, the AI enriches the test suite with semantic classes, the bug-relevant dimensions that pure coverage misses:

Truncation boundaries at every division and downcast
Overflow boundaries at every multiplication and shift
Reentrancy surfaces at every external call
Oracle manipulation vectors at every price-dependent computation

For each semantic boundary, the system generates targeted properties that distinguish the safe case from the dangerous case. A division that truncates safely at normal values might produce a loss-of-precision exploit at extreme values. The AI ensures both sides of every boundary are tested.

Campaign execution and analysis

The generated test suite runs as a full stateful fuzzing campaign. Millions of randomized transaction sequences, each checked against every property. Violations are captured with full call sequences for reproduction.

But the AI's job doesn't end at campaign execution. It analyzes the corpus (the set of inputs the fuzzer explored) to identify patterns. Which functions are under-explored? Which state transitions are rare? Which property violations cluster around specific parameter ranges? This analysis feeds back into property refinement, creating a loop that converges toward complete coverage.

What this achieves

The combined system (automated property generation, coverage gap filling, semantic enrichment, and iterative campaign analysis) produces results that neither pure manual review nor naive AI can match:

Measurable coverage. We can report exactly how many coverage classes and semantic classes were tested, and which ones remain open. This isn't a percentage bar. It's an enumerated list of every execution path through the protocol.

Reproducible results. The test suite is deterministic. Any auditor can run it, verify the results, and extend it. Two runs on the same code produce the same coverage.

Continuous protection. The generated test suite is a deliverable. It runs in CI. When the protocol changes, the properties catch regressions immediately. This is the opposite of a one-time PDF report that becomes stale the moment code changes.

Human-AI collaboration. The AI generates the coverage map. Human auditors review the flagged edge cases, the semantic boundary conditions, and the cross-function interaction paths. The human focuses on judgment ("is this behavior intended?") while the AI handles enumeration: "here are all the behaviors."

Why most "AI audits" fall short

The fundamental error in the current AI audit space is treating the LLM as the end product. An LLM's opinion about code is, at best, a hint. It might be correct. It might be hallucinated. You can't tell the difference without execution.

Our approach uses AI to generate executable artifacts (properties, test suites, coverage reports) that can be verified independently of the AI that produced them. If the AI generates a wrong property, the fuzzer will either never trigger it (dead code, filtered out) or trigger it on a false positive (caught during human review). The system is self-correcting in a way that pure LLM analysis isn't.

This is the difference between "AI told us this is safe" and "AI generated a test suite that exercises 1,847 coverage classes across 23 contracts, and no property violations were found in 50 million test iterations." The first is an opinion. The second is evidence.

The road ahead

We're pushing this further. The integration of symbolic execution with AI-guided property generation is opening paths to proving certain properties mathematically rather than just testing them probabilistically. The combination of formal verification, fuzzing, and AI-guided test generation may eventually provide something close to a completeness guarantee for known bug classes.

For now, the practical result is already clear: AI-powered auditing, done correctly, finds more bugs, covers more code, and delivers more lasting protection than either manual review or naive automation alone.

For benchmarks showing Recon Magic achieving 38x speedups on production codebases, read Introducing Recon Magic on our Substack.

Want to see what AI-powered invariant testing finds on your protocol? Request an audit and we'll show you the coverage map.

AI Smart Contract Audits: Beyond the Hype

AI smart contract audits: beyond the hype

The problem with "AI auditing"

What AI-powered auditing should look like

Automated property generation

Coverage gap analysis

Semantic enrichment

Campaign execution and analysis

What this achieves

Why most "AI audits" fall short

The road ahead

Towards the Scientific Audit

Why your audit found nothing: the false confidence problem

AI smart contract audits: beyond the hype

The problem with "AI auditing"

What AI-powered auditing should look like

Automated property generation

Coverage gap analysis

Semantic enrichment

Campaign execution and analysis

What this achieves

Why most "AI audits" fall short

The road ahead

Related Posts

Towards the Scientific Audit

Why your audit found nothing: the false confidence problem

Related Glossary Terms

See what AI-powered auditing finds on your protocol