Differential testing for smart contracts: comparing implementations to find bugs
Differential Testing for Smart Contracts: Comparing Implementations to Find Bugs
By antonio — April 2026
Here's a simple idea that catches surprisingly nasty bugs: take two implementations of the same thing and compare their outputs. If they disagree, at least one of them is wrong.
That's differential testing. It's been a workhorse technique in compiler testing and browser security for decades. It works just as well for smart contracts — maybe even better, because DeFi is full of multiple implementations of the same specs.
Let me show you how to set it up, where it shines, and the real bugs it catches.
What is differential testing?
The concept is straightforward:
- You have two (or more) implementations of the same specification
- You feed them the same inputs
- You compare their outputs
- Any difference is a bug in at least one implementation
The power is that you don't need to know what the correct output should be. You just need to know that both implementations should agree. This lets you generate millions of random inputs without writing specific expected outputs for each one.
Input → Implementation A → Output A ─┐
├─→ Compare → Mismatch = Bug
Input → Implementation B → Output B ─┘
For smart contracts, the "implementations" can be:
- Two different contracts implementing the same ERC standard
- A reference implementation vs a gas-tuned version
- The same contract compiled with different Solidity versions
- A Solidity implementation vs a Vyper implementation
- An on-chain contract vs an off-chain simulator
When differential testing makes sense
Not every project needs differential testing. Here's when it's worth the setup cost:
You're building a gas-efficient version of something standard. If you're writing a gas-tuned ERC-20 or a custom AMM based on a known formula, differential testing against the reference implementation catches bugs introduced by the rewrite.
You're migrating between versions. Upgrading from Solidity 0.7 to 0.8? Migrating a Vyper contract to Solidity? Differential testing verifies behavioral equivalence.
You have a spec with multiple implementations. ERC-4626 vaults, ERC-2612 permits, or any standard where multiple teams have written compliant implementations.
You have an off-chain model. Many DeFi protocols have Python or TypeScript models for their math. Differential testing against the on-chain implementation catches precision and rounding bugs.
Setting up differential tests in Foundry
Foundry makes differential testing relatively straightforward. Here's a complete example comparing two AMM implementations.
The setup: two AMM implementations
Say we have a reference AMM and a gas-efficient version:
// ReferenceAMM.sol -- clear, correct, not gas-efficient
contract ReferenceAMM {
uint256 public reserveA;
uint256 public reserveB;
constructor(uint256 _reserveA, uint256 _reserveB) {
reserveA = _reserveA;
reserveB = _reserveB;
}
function getAmountOut(
uint256 amountIn,
bool isTokenA
) external view returns (uint256 amountOut) {
uint256 reserveIn = isTokenA ? reserveA : reserveB;
uint256 reserveOut = isTokenA ? reserveB : reserveA;
// Standard constant product formula: x * y = k
// amountOut = reserveOut - (reserveIn * reserveOut) /
// (reserveIn + amountIn)
// With 0.3% fee
uint256 amountInWithFee = amountIn * 997;
uint256 numerator = amountInWithFee * reserveOut;
uint256 denominator = (reserveIn * 1000) + amountInWithFee;
amountOut = numerator / denominator;
}
function swap(uint256 amountIn, bool isTokenA) external
returns (uint256 amountOut)
{
amountOut = this.getAmountOut(amountIn, isTokenA);
if (isTokenA) {
reserveA += amountIn;
reserveB -= amountOut;
} else {
reserveB += amountIn;
reserveA -= amountOut;
}
}
}
// OptimizedAMM.sol -- gas-efficient, uses assembly
contract OptimizedAMM {
uint256 public reserveA;
uint256 public reserveB;
constructor(uint256 _reserveA, uint256 _reserveB) {
reserveA = _reserveA;
reserveB = _reserveB;
}
function getAmountOut(
uint256 amountIn,
bool isTokenA
) external view returns (uint256 amountOut) {
assembly {
let reserveIn := sload(
add(reserveA.slot, iszero(isTokenA))
)
let reserveOut := sload(
add(reserveA.slot, iszero(iszero(isTokenA)))
)
let amountInWithFee := mul(amountIn, 997)
let numerator := mul(amountInWithFee, reserveOut)
let denominator := add(
mul(reserveIn, 1000), amountInWithFee
)
amountOut := div(numerator, denominator)
}
}
function swap(uint256 amountIn, bool isTokenA) external
returns (uint256 amountOut)
{
amountOut = this.getAmountOut(amountIn, isTokenA);
if (isTokenA) {
reserveA += amountIn;
reserveB -= amountOut;
} else {
reserveB += amountIn;
reserveA -= amountOut;
}
}
}
The differential fuzz test
// test/DifferentialAMM.t.sol
pragma solidity ^0.8.19;
import "forge-std/Test.sol";
import "../src/ReferenceAMM.sol";
import "../src/OptimizedAMM.sol";
contract DifferentialAMMTest is Test {
ReferenceAMM ref;
OptimizedAMM opt;
function setUp() public {
// Same initial state
ref = new ReferenceAMM(1_000_000e18, 1_000_000e18);
opt = new OptimizedAMM(1_000_000e18, 1_000_000e18);
}
/// @dev Fuzz test: getAmountOut should match for any input
function testFuzz_getAmountOut_matches(
uint256 amountIn,
bool isTokenA
) public view {
// Bound to reasonable range
amountIn = bound(amountIn, 1, 1_000_000e18);
uint256 refOut = ref.getAmountOut(amountIn, isTokenA);
uint256 optOut = opt.getAmountOut(amountIn, isTokenA);
assertEq(
refOut,
optOut,
"getAmountOut mismatch between reference and gas-efficient"
);
}
/// @dev Fuzz test: swap sequences should produce same state
function testFuzz_swapSequence_matches(
uint256[5] calldata amounts,
bool[5] calldata directions
) public {
for (uint256 i = 0; i < 5; i++) {
uint256 amount = bound(amounts[i], 1, 100_000e18);
uint256 refOut = ref.swap(amount, directions[i]);
uint256 optOut = opt.swap(amount, directions[i]);
assertEq(
refOut,
optOut,
string.concat(
"Swap output mismatch at step ",
vm.toString(i)
)
);
}
// Final reserves should match exactly
assertEq(ref.reserveA(), opt.reserveA(), "reserveA mismatch");
assertEq(ref.reserveB(), opt.reserveB(), "reserveB mismatch");
}
}
Run it:
forge test --match-contract DifferentialAMMTest -vvv --fuzz-runs 10000
If the assembly optimization has a bug, say, the storage slot calculation for reserveB is off by one, the fuzzer will find inputs where the outputs diverge.
Cross-Language differential testing with FFI
One of the most powerful applications: comparing your Solidity implementation against a Python or Rust reference using Foundry's FFI.
Solidity vs Python math
// test/DifferentialMath.t.sol
pragma solidity ^0.8.19;
import "forge-std/Test.sol";
import "../src/MathLib.sol";
contract DifferentialMathTest is Test {
MathLib lib;
function setUp() public {
lib = new MathLib();
}
function testFuzz_sqrt_matchesPython(uint256 x) public {
x = bound(x, 0, type(uint128).max);
// Get Solidity result
uint256 solidityResult = lib.sqrt(x);
// Get Python result via FFI
string[] memory cmd = new string[](3);
cmd[0] = "python3";
cmd[1] = "-c";
cmd[2] = string.concat(
"import math; print(math.isqrt(",
vm.toString(x),
"))"
);
bytes memory result = vm.ffi(cmd);
uint256 pythonResult = vm.parseUint(string(result));
assertEq(
solidityResult,
pythonResult,
string.concat(
"sqrt mismatch for input ",
vm.toString(x)
)
);
}
function testFuzz_expWad_matchesPython(int256 x) public {
// Bound to range where exp doesn't overflow
x = bound(x, -42139678854452767551, 135305999368893231589);
int256 solidityResult = lib.expWad(x);
string[] memory cmd = new string[](3);
cmd[0] = "python3";
cmd[1] = "-c";
cmd[2] = string.concat(
"from decimal import Decimal, getcontext; ",
"getcontext().prec = 50; ",
"x = Decimal('",
vm.toString(x),
"') / Decimal(10**18); ",
"import math; ",
"result = int(Decimal(str(math.exp(float(x)))) ",
"* Decimal(10**18)); ",
"print(result)"
);
bytes memory result = vm.ffi(cmd);
int256 pythonResult = vm.parseInt(string(result));
// Allow 1 wei tolerance for rounding differences
assertApproxEqAbs(
solidityResult,
pythonResult,
1,
"expWad mismatch"
);
}
}
This technique catches subtle fixed-point arithmetic bugs that are really hard to spot in manual review. The Python Decimal library gives you arbitrary precision to compare against.
Cross-Version differential testing
Solidity version changes introduce behavioral differences. Some are documented, some aren't.
Solidity 0.7 vs 0.8 behavior
The biggest change was checked arithmetic. But there are subtler differences:
// test/CrossVersion.t.sol
// This test compares behavior between a 0.7-style implementation
// (using unchecked) and a 0.8 implementation
contract CrossVersionTest is Test {
LegacyMath legacy; // Uses unchecked blocks to mimic 0.7
ModernMath modern; // Standard 0.8 checked arithmetic
function setUp() public {
legacy = new LegacyMath();
modern = new ModernMath();
}
function testFuzz_division_behavior(
uint256 a,
uint256 b
) public {
// In 0.7: division by zero returned 0
// In 0.8: division by zero reverts
if (b == 0) {
// Expect modern to revert
vm.expectRevert();
modern.divide(a, b);
// Legacy should return 0 (if it mimics 0.7 behavior)
// If your migration kept this behavior, test it
// If not, this differential test catches the discrepancy
return;
}
assertEq(
legacy.divide(a, b),
modern.divide(a, b),
"Division result mismatch"
);
}
function testFuzz_shift_behavior(
uint256 value,
uint256 shift
) public {
// In 0.7: shifting by >= 256 was undefined behavior
// In 0.8: shifting by >= 256 returns 0
shift = bound(shift, 0, 512);
if (shift >= 256) {
assertEq(
modern.shiftRight(value, shift),
0,
"Shift >= 256 should return 0 in 0.8"
);
return;
}
assertEq(
legacy.shiftRight(value, shift),
modern.shiftRight(value, shift),
"Shift result mismatch"
);
}
}
This is especially useful during protocol migrations. We've seen bugs introduced during 0.7→0.8 migrations where developers added unchecked blocks in the wrong places, accidentally preserving overflow behavior in functions that should've been checked.
ABI encoding differential tests
ABI encoding bugs are subtle and dangerous. Compare your manual encoding against Solidity's built-in encoder:
function testFuzz_customEncoding_matchesABI(
address addr,
uint256 amount,
bytes32 id
) public pure {
// Your custom encoding (maybe for gas optimization)
bytes memory custom = abi.encodePacked(
bytes20(addr),
bytes32(amount),
id
);
// Standard encoding
bytes memory standard = abi.encode(addr, amount, id);
// These SHOULD differ (packed vs padded) --
// but your decoder must handle the format it actually uses
// The real test: encode then decode and compare values
(address decodedAddr, uint256 decodedAmount, bytes32 decodedId) =
abi.decode(standard, (address, uint256, bytes32));
assertEq(decodedAddr, addr);
assertEq(decodedAmount, amount);
assertEq(decodedId, id);
}
Real bugs found by differential testing
Let me share some real patterns where differential testing caught issues:
1. rounding direction discrepancy
A vault's deposit() function rounded shares down (correct, favors the vault), but previewDeposit() rounded up (incorrect, overpromised shares):
function testFuzz_depositPreview_matches(uint256 assets) public {
assets = bound(assets, 1, 1_000_000e18);
uint256 previewedShares = vault.previewDeposit(assets);
uint256 actualShares = vault.deposit(assets, address(this));
// ERC-4626 spec: previewDeposit MUST return <= actual shares
assertLe(
previewedShares,
actualShares,
"Preview overpromised shares"
);
}
The fuzzer found inputs where previewDeposit returned more shares than deposit actually minted. This is a spec violation that can cause accounting bugs in integrating contracts.
2. assembly optimization gone wrong
A hand-rolled mulDiv function in assembly produced incorrect results for specific input ranges near type(uint256).max:
function testFuzz_mulDiv_reference(
uint256 a,
uint256 b,
uint256 denominator
) public pure {
denominator = bound(denominator, 1, type(uint256).max);
// Skip overflow cases
if (b != 0 && a > type(uint256).max / b) return;
uint256 fast = OptimizedMath.mulDiv(a, b, denominator);
uint256 reference = (a * b) / denominator;
assertEq(fast, reference, "mulDiv mismatch");
}
The assembly version had an off-by-one in its high-word multiplication logic. Only triggered when both a and b had specific bit patterns in their upper 128 bits.
3. cross-Chain behavior difference
A contract deployed on both Ethereum and Arbitrum produced different results for the same inputs because of PUSH0 opcode availability and different gas costs affecting an internal gas-bounded loop:
function testFuzz_crossChain_equivalence(
uint256 input
) public {
// Fork Ethereum mainnet
vm.createSelectFork("mainnet");
uint256 mainnetResult = target.compute(input);
// Fork Arbitrum
vm.createSelectFork("arbitrum");
uint256 arbResult = target.compute(input);
assertEq(
mainnetResult,
arbResult,
"Cross-chain result mismatch"
);
}
Advanced: differential invariant testing
Combine differential testing with invariant testing for maximum coverage. Instead of comparing single function calls, compare entire operation sequences:
contract DifferentialInvariantTest is Test {
ReferenceVault refVault;
OptimizedVault optVault;
DiffHandler handler;
function setUp() public {
refVault = new ReferenceVault(address(asset));
optVault = new OptimizedVault(address(asset));
handler = new DiffHandler(refVault, optVault, asset);
targetContract(address(handler));
}
function invariant_stateAlwaysMatches() public view {
assertEq(
refVault.totalAssets(),
optVault.totalAssets(),
"totalAssets diverged"
);
assertEq(
refVault.totalSupply(),
optVault.totalSupply(),
"totalSupply diverged"
);
}
}
contract DiffHandler {
ReferenceVault ref;
OptimizedVault opt;
// Every handler function performs the same action on both
function deposit(uint256 amount, uint256 actorSeed) external {
address actor = actors[actorSeed % actors.length];
amount = bound(amount, 1, asset.balanceOf(actor) / 2);
// Deposit into both with same params
vm.startPrank(actor);
asset.approve(address(ref), amount);
uint256 refShares = ref.deposit(amount, actor);
asset.approve(address(opt), amount);
uint256 optShares = opt.deposit(amount, actor);
vm.stopPrank();
require(
refShares == optShares,
"Share mismatch on deposit"
);
}
function withdraw(uint256 amount, uint256 actorSeed) external {
address actor = actors[actorSeed % actors.length];
uint256 maxRef = ref.maxWithdraw(actor);
uint256 maxOpt = opt.maxWithdraw(actor);
require(maxRef == maxOpt, "maxWithdraw mismatch");
if (maxRef == 0) return;
amount = bound(amount, 1, maxRef);
vm.startPrank(actor);
uint256 refAssets = ref.withdraw(amount, actor, actor);
uint256 optAssets = opt.withdraw(amount, actor, actor);
vm.stopPrank();
require(
refAssets == optAssets,
"Asset mismatch on withdraw"
);
}
}
This catches state divergence that only shows up after specific sequences of operations. The fuzzer generates random sequences of deposits and withdrawals, and the invariant checks that both implementations stay in sync at every step.
Practical tips
Start with the pure math. The highest-value differential tests compare mathematical functions, swap calculations, interest accrual, pricing formulas. These are deterministic, easy to test, and where precision bugs hide.
Use Python/Rust for reference. Don't build your reference in Solidity if you can avoid it. Use a language with arbitrary-precision arithmetic. This eliminates the risk of both implementations sharing the same bug.
Bound your inputs carefully. Differential testing generates a lot of inputs. If most of them hit trivial code paths (zero amounts, empty arrays), you're wasting cycles. Use Foundry's bound() to focus on interesting ranges.
Log the failing input. When a differential test fails, the specific input that caused divergence is gold. Log it, reproduce it, and understand why the implementations disagree.
Combine with fuzzing. Differential testing tells you what disagrees. Invariant testing and property-based testing tell you what properties should hold. Use both.
For a broader comparison of smart contract fuzzing tools, check our dedicated post.
When not to use differential testing
It's not always the right tool:
- No reference implementation exists. You're building something novel and there's nothing to compare against.
- Implementations are intentionally different. If one version adds a fee and the other doesn't, they're supposed to disagree.
- Performance isn't worth it. FFI-based cross-language testing is slow. For simple contracts, direct property testing is faster and just as effective.
In those cases, stick with standard invariant testing and direct property assertions.
Wrapping up
Differential testing is one of those techniques that's simple in concept but catches bugs that other approaches miss. The insight is that you don't need to know the right answer, you just need two sources that should agree.
In DeFi, where specs get implemented multiple times, where gas-tuned rewrites replace reference code, and where cross-chain deployments must behave identically, differential testing fits naturally.
Set up the comparison. Let the fuzzer generate inputs. Wait for the disagreement. Fix the bug.
Try Recon Pro
Related Posts
Fuzzing ZK circuits: testing Noir and Circom with property-based approaches
ZK circuits need testing too. Under-constrained circuits, missing range checks, and witness mismatch...
Halmos symbolic execution for smart contracts: setup, limitations, and when it beats fuzzing
Fuzzers sample randomly. Symbolic execution explores every path. Halmos brings symbolic execution to...