AI-Guided Fuzzing: From LLM Property Generation to Automated Campaigns

Author: deivitto | Guide

You've probably seen the hype. "AI will replace auditors." "GPT found a critical bug." "Claude wrote my invariants." Some of it's real. Most of it isn't. Here's what actually works when you pair LLMs with fuzzing campaigns — and where things fall apart fast.

I've been running AI-assisted fuzzing workflows for over a year now. The results are mixed, but the good parts are genuinely useful. Let's break it down.

The reality of AI-generated properties

LLMs are surprisingly decent at reading a contract and suggesting what should be true. They're much worse at writing the exact Solidity code that tests it. This distinction matters.

Here's a typical workflow. You feed a contract to Claude or GPT and ask: "What invariants should hold for this token vault?"

A good AI suggestion looks like this:

"The total assets reported by the vault should always be >= the sum
of all depositor shares converted to assets at the current exchange rate"

That's a real insight. It captures the core accounting invariant. But when the LLM tries to write the actual test code, you get stuff like:

// BAD: AI-generated property -- looks right, doesn't compile
function invariant_vault_accounting() public {
    uint256 totalAssets = vault.totalAssets();
    uint256 totalShares = vault.totalSupply();
    // This doesn't even make sense -- you can't iterate depositors on-chain
    for (uint i = 0; i < vault.depositors.length; i++) {
        // ...
    }
}

The AI doesn't understand that on-chain state isn't freely iterable. It doesn't know your ghost variable setup. It doesn't know how your fuzzing harness tracks actors. That's where human expertise comes in.

What actually works: the three-step pattern

After a lot of trial and error, here's the pattern that consistently produces results:

Step 1: LLM reads the code, suggests English properties

Feed the contract source (or even just the interface + NatSpec) to an LLM. Ask for invariants in plain English. Don't ask for code yet.

// Example prompt output for a lending protocol:
//
// 1. Total borrows should never exceed total deposits
// 2. A user's collateral value (at liquidation threshold)
//    should always cover their debt, or they should be liquidatable
// 3. Interest accrual should be monotonically increasing
// 4. The sum of all user deposits should equal totalDeposits
// 5. No single transaction should move the exchange rate by more than X%

These are solid starting points. Maybe 60-70% of them are actually testable and meaningful. The rest are either too vague, already covered by Solidity's type system, or just wrong.

Step 2: human filters and refines

This is where you earn your keep. You look at those suggestions and ask:

Can I actually measure this in a harness?
Does the contract have the view functions I need?
Is this invariant conditional (only holds when X is true)?
Is this already enforced by a require statement?

That property about "no single transaction should move the exchange rate by more than X%"? That's gold. It catches donation attacks, flash loan manipulation, and rounding exploits. You keep that one. The one about "interest accrual should be monotonically increasing" needs work. What about when rates change? What about precision loss? You refine it.

Step 3: human writes the actual test code

// GOOD: Human-written property inspired by AI suggestion
function invariant_exchange_rate_bounded() public {
    uint256 currentRate = vault.convertToAssets(1e18);
    uint256 previousRate = ghost_lastExchangeRate;

    if (previousRate > 0) {
        uint256 maxDelta = previousRate * MAX_RATE_CHANGE_BPS / 10000;
        assert(
            currentRate <= previousRate + maxDelta &&
            currentRate >= previousRate - maxDelta
        );
    }

    ghost_lastExchangeRate = currentRate;
}

See the difference? The human knows about ghost variables, knows how the harness tracks state between calls, and knows the right precision for comparisons.

AI for campaign configuration

Where AI gets interesting is campaign setup. Instead of writing properties, you can use LLMs to generate fuzzer configurations.

# AI-suggested medusa config after analyzing contract complexity
fuzzing:
  workers: 8
  callSequenceLength: 50
  corpusDirectory: "./corpus"
  coverageEnabled: true
  targetContracts:
    - "VaultHarness"
  testing:
    testAllContracts: false
    assertionTesting:
      enabled: true
    propertyTesting:
      enabled: true
      maxSequenceLength: 100

This isn't magic, it's just the LLM recognizing patterns. A vault with multiple entry points needs longer call sequences. A protocol with time-dependent logic needs the fuzzer to warp timestamps. You'd figure this out yourself, but the AI saves you 10 minutes of config tweaking.

Good vs bad AI properties: a side-by-side

Let's look at a real example. Given an AMM contract:

AI property (bad)

// AI generated this -- it LOOKS reasonable
function invariant_constant_product() public {
    uint256 reserve0 = pool.getReserve0();
    uint256 reserve1 = pool.getReserve1();
    uint256 k = reserve0 * reserve1;
    assert(k >= initialK); // "k should never decrease"
}

Why it's bad: k should increase from fees, but this doesn't account for rounding, fee-on-transfer tokens, or that initialK needs to be tracked as a ghost variable updated after every legitimate swap. It also overflows on large reserves.

Human-refined property (good)

function invariant_k_non_decreasing() public {
    uint256 reserve0 = pool.getReserve0();
    uint256 reserve1 = pool.getReserve1();
    // Use uint256 multiplication carefully
    uint256 currentK = uint256(reserve0) * uint256(reserve1);

    // k can decrease by at most 1 wei due to rounding per swap
    // Track cumulative rounding tolerance
    uint256 tolerance = ghost_swapCount * 1;

    if (ghost_previousK > 0) {
        assert(currentK + tolerance >= ghost_previousK);
    }

    ghost_previousK = currentK;
}

The human version handles rounding and overflow while tracking state properly. The AI gave us the idea — the human made it work.

When AI hallucinates (and how to catch it)

LLMs confidently produce nonsense properties about 20-30% of the time. Common failure modes:

Impossible state assertions

The AI claims "totalSupply should equal the sum of all balances." For most ERC20s, this is true. But for rebasing tokens? Fee-on-transfer? It's wrong and the AI won't tell you.

Inverted logic

// AI wrote this backwards
function invariant_health_factor() public {
    // WRONG: this asserts users are ALWAYS healthy
    // but users CAN be unhealthy -- that's when liquidation kicks in
    assert(protocol.healthFactor(user) >= 1e18);
}

The correct property is: if a user's health factor is below 1, then calling liquidate() should succeed. The AI confused "desired state" with "invariant."

Missing preconditions

AI writes a property that should only hold after initialization, or only when the protocol isn't paused, or only for non-zero balances. It skips the if guard and you get false positives everywhere.

Building an AI-assisted pipeline

Here's the practical workflow I use:

Feed the codebase to an LLM (full contracts, interfaces, existing tests)
Ask for invariants in English first. No code, just natural language
Filter the list. Remove duplicates, impossibles, and trivially-true statements
Ask the LLM to categorize. Which are safety properties vs liveness vs economic?
Write harness code yourself, or pair with the LLM for boilerplate
Run the campaign. Let the fuzzer loose
Feed failures back to the LLM. "This property broke, here's the call trace, why?"

Step 7 is underrated. LLMs are actually good at reading a fuzzer's counterexample and explaining why the property broke. It's like having a junior auditor who reads stack traces fast.

// Example: AI helped identify this after seeing a fuzzer trace
// The fuzzer found that calling deposit(0) followed by withdraw(0)
// changed the exchange rate due to rounding
function test_zero_amount_exchange_rate() public {
    uint256 rateBefore = vault.convertToAssets(1e18);
    vault.deposit(0, address(this));
    vault.withdraw(0, address(this), address(this));
    uint256 rateAfter = vault.convertToAssets(1e18);

    // Zero-amount operations shouldn't change the rate
    assertEq(rateBefore, rateAfter);
}

When to skip AI entirely

Don't bother with AI-generated properties when:

The codebase is small (< 200 LOC). You can read it faster than prompting.
It's a well-known pattern (ERC20, ERC721, standard vault). Just use property templates from invariant testing guides.
The logic is highly mathematical. AMM curve math, options pricing. AI gets the math wrong more often than right.
You need formal guarantees. Use formal verification tools instead.

Tool integration

The best results come from tight integration between your LLM and your fuzzing toolchain. If you're using Recon's fuzzing framework, you can pipe AI-generated property skeletons directly into your harness templates.

For EVM projects, check out the comparison between Echidna and Medusa, which handles AI-generated properties differently because of their corpus management approaches.

If you want to see AI-assisted auditing in action, there's a deeper breakdown at AI auditing covering the full spectrum from property generation to report writing.

And for the complete picture of how fuzzing tools compare when fed AI-generated configs, see our tools comparison.

Practical takeaways

Use AI for ideation, not implementation. LLMs suggest what to test. Humans write the tests.
English first, code second. Always get natural language properties before asking for Solidity.
Budget 30% of AI suggestions as garbage. That's normal. The 70% that's good saves you real time.
Feed counterexamples back. LLMs are great at explaining why something broke.
Don't trust AI on math-heavy protocols. The models mess up invariant math regularly.

AI-guided fuzzing isn't a replacement for knowing what you're doing. It's a multiplier for people who already understand invariant testing and security patterns. Use it that way and you'll ship better campaigns, faster.

Get an AI-Assisted Audit

Try Recon Pro

AI-guided fuzzing: from LLM property generation to automated campaigns

AI-Guided Fuzzing: From LLM Property Generation to Automated Campaigns

The reality of AI-generated properties

What actually works: the three-step pattern

Step 1: LLM reads the code, suggests English properties

Step 2: human filters and refines

Step 3: human writes the actual test code

AI for campaign configuration

Good vs bad AI properties: a side-by-side

AI property (bad)

Human-refined property (good)

When AI hallucinates (and how to catch it)

Building an AI-assisted pipeline

When to skip AI entirely

Tool integration

Practical takeaways

Mutation testing for smart contracts: measure your test suite quality

5 Properties Every Smart Contract Auditor Forgets to Test

AI-Guided Fuzzing: From LLM Property Generation to Automated Campaigns

The reality of AI-generated properties

What actually works: the three-step pattern

Step 1: LLM reads the code, suggests English properties

Step 2: human filters and refines

Step 3: human writes the actual test code

AI for campaign configuration

Good vs bad AI properties: a side-by-side

AI property (bad)

Human-refined property (good)

When AI hallucinates (and how to catch it)

Building an AI-assisted pipeline

When to skip AI entirely

Tool integration

Practical takeaways

Related Posts

Mutation testing for smart contracts: measure your test suite quality

5 Properties Every Smart Contract Auditor Forgets to Test

Related Glossary Terms

Get AI-augmented security testing