Chapter 20 — Crucible: The Memory Fuzzer

Chapter 18 showed you a machine that hunts for a logic bug: --emit=check invents inputs to a function until one of its contracts turns false. Crucible is the twin of that idea, pointed at memory. Instead of inputs to one function it invents whole programs — entire .ig files — and runs each one through a battery of detectors that watch what the runtime does with memory: whether a value is freed twice, leaks, is read after it has been freed, or quietly comes back wrong. Logic correctness and memory correctness are different problems, so Ingle hunts them with two different tools.

Crucible is a build-time developer tool. It lives in tools/ (a generator, crucible.c, and a driver, crucible.sh), it is never shipped inside inglec, and you run the whole thing with one command:

make crucible

That builds everything it needs and sweeps a default of 150 seeds; when you want a shorter or longer run, call the driver directly with a count — tools/crucible.sh 120. Either way, a green run ends like this and exits 0:

crucible: 120 seeds → 120 clean, 0 distinct (0 NEW).
crucible: ✓ no new memory faults.

A run that turns up something new exits 1 and tells you exactly where the evidence is. That is the entire contract: a clean exit means the language could not be made to mishandle memory across the combinations Crucible knows how to build; a non-zero exit hands you a minimal program that proves otherwise.

Why a fuzzer, and not just more tests

Every memory bug Ingle has ever had lived at a combination of features that, taken one at a time, all worked: a value-struct double-freed when it was borrowed into a multi-slot parameter; a string interpolation that leaked its intermediate pieces; a field assignment through an array index; a value-struct double-freed when shared through an erased generic. Nobody sits down and writes a test for “a struct with a string field, stored as the value of a Map, read back inside a loop, and then interpolated” — because nobody thinks of it. Each of those bugs was found the hard way, reactively, when a real program happened to walk into the combination and crashed on the way out.

Crucible’s job is to walk into those combinations first, by the hundred, automatically. The principle the source states for itself is “no knowledge lost”: every shape that has ever bitten the language is inside the space the generator samples, so the same class of bug cannot come back unseen. On its first run it surfaced a bug where a Map whose value is an array handed back a corrupted, empty view — a combination no hand-written test had thought to try.

Why this needs its own tool. Property-based testing targets your logic, the way --emit=check does. Crucible targets the runtime memory model itself: it pits two backends against each other and watches the allocator for double-frees and leaks. Ingle frees memory deterministically from tracked ownership (Chapter 13), and that deterministic freedom is exactly the thing that needs a fuzzer pointed at it.

The generator: one seed, one program

The generator takes a seed and prints one valid Ingle program to standard output. Same seed, same program, every time — so every finding is perfectly reproducible. You can watch it work:

build/crucible 1 30

The 1 is the seed; the 30 is a loop trip-count the driver later scales up and down for the leak check. Each program is built from self-contained operations — one function per danger pattern — and every one of them folds every value it touches into a running acc:

fn op0() -> int {
    var acc = 0
    var m = map.Map<string, S>{ buckets: [], count: 0 }
    m.set("k0", S { a: 98, s: "z" })
    m.set("k1", S { a: 53, s: "q" })
    m.set("k0", S { a: 31, s: "ab" })
    match m.get("k0") { case Some(v) {
    acc = acc + v.a + v.s.len()
    } case None {} }
    match m.get("k1") { case Some(v) {
    acc = acc + v.a + v.s.len()
    } case None {} }
    return acc
}

That one stores a struct as a Map value, overwrites a key, reads two keys back, and folds the recovered a field and the heap string’s length into acc — so the heap leaf is exercised, not just the scalar. main then sums every opN() and prints the total. That total is the keystone of the whole design: it is a checksum of every value the program touched. If a value is silently dropped, duplicated, or read back wrong, the number changes — which is how Crucible catches wrong answers, not merely crashes. Run the seed-1 program straight through the VM:

inglec --emit=run seed1.ig

3722=> 0

The 3722 is the checksum main printed; => 0 is the value it returned. (print adds no newline, so the two land on the same line.)

What does the generator deliberately reach for? The places memory bugs live:

Struct shapes — all-scalar, with a heap string field, with a nested struct, or with both.
Erased generic containers — those structs placed into [T], Map<string, T>, Map<string, [T]>, and Option<T>, including nested combinations, built and read back through generic helpers like fn c_pair<T>(a: T, b: T) -> [T] and fn c_keep<T>(move x: T) -> T.
Movement — passed by move and by borrow, returned, read back, mutated through an array index (arr[i].field = …), and interpolated ("row{i}-x{i}"), inside loops and across reassignment.

That list is not arbitrary. It is the union of every feature-combination that has ever produced a memory bug in Ingle.

The five oracles

The driver runs each generated program through five independent detectors. In fuzzing these are called oracles, because each one knows how to recognise a particular kind of wrong. The driver builds the variant compilers it needs (a drop-trace build, an AddressSanitizer build) on demand the first time you run it.

Oracle	What it catches	How
Double-drop detector	A value freed twice	A compiler built with `-DEMBER_DROP_TRACE` stamps a sentinel after each reclaim and aborts if it ever frees that object again
VM fault	A runtime fault (bad index, etc.)	Runs under the VM and looks for `runtime error: …`
ASan	Use-after-free, buffer overflow, double-free	A compiler built with AddressSanitizer (`make asan`)
RSS leak	Memory that grows super-linearly	Runs the same seed at 50 and 6000 loops and compares peak resident memory
VM↔native differential	A wrong answer, or a native-only crash	Compiles the program to a native binary, runs it, and compares output and exit code against the VM

There is a sixth thing it watches for, quietly: if the generator ever emits a program that doesn’t compile, that is a bug in the generator — an over-reach past what the language allows — and it’s reported as gen-compile-error so it can never be mistaken for a real finding.

Why five, and why these? Because each one sees something the others can’t. The runtime’s pool allocator recycles memory instead of calling free, so a plain ASan build can read a use-after-free as perfectly valid (recycled) memory and miss it entirely — which is precisely why the double-drop detector exists, stamping a sentinel that survives recycling. The differential is the only oracle that catches a silently wrong answer: a program that runs cleanly, leaks nothing, double-frees nothing, and still prints a different checksum on the two backends. The leak oracle is the only one that catches unbounded growth in a program that is otherwise correct. Memory safety is not a single property, so it does not get a single detector.

Findings: signatures, shrinking, and minimal repros

When an oracle fires, the driver does three useful things rather than merely shouting.

First, it reduces the failure to a signature — a short string such as vm-fault:runtime error: array index out of bounds, or double-drop:type_id=3, or diff:VM-ne-native. Distinct signatures are distinct findings; the same signature seen again is the same bug, reported once.

Second, for each new signature it shrinks the program to a minimal reproducer: it greedily deletes operations — the + opK() terms in main make this trivial — for as long as the signature still holds, and saves the result under tools/crucible-finds/. So you are never handed a hundred-line generated program; you get the smallest one that still fails. Here is a real one, a differential finding shrunk from its original two operations down to a single guilty loop:

// crucible seed=3 shape=3 ops=2 loops=30 — generated; do not edit.
import "std/map" as map

struct Inner { x: int }

struct S {
    a: int
    s: string
    inner: Inner
}

fn c_pair<T>(a: T, b: T) -> [T] { return [a, b] }
fn c_keep<T>(move x: T) -> T { return x }


fn op1() -> int {
    var acc = 0
    var i = 0
    loop {
        if i == 30 { break }
        let xs = c_pair(S { a: 65, s: "hello", inner: Inner { x: 31 } }, S { a: 5, s: "longerstring", inner: Inner { x: 18 } })
        acc = acc + xs[0].a + xs[1].a
        i = i + 1
    }
    return acc
}

fn main() -> int {
    var total = 0
    total = total + op1()
    print("{total}")
    return 0
}

Third, it prints one line per distinct finding — the signature, the seed, and the repro path:

── [NEW]   [diff:VM-ne-native]  seed=3  → tools/crucible-finds/find2_diff_VM_ne_native.ig  (minimal: 1 op)

When the oracle that fired was the double-drop detector, the underlying evidence — written by the -DEMBER_DROP_TRACE runtime — has this shape, and the sentinel 0x5EAD5EAD is the giveaway:

*** EMBER DOUBLE-DROP obj=<addr> type=<n> ***
STRUCT type_id=<n> field_count=<n>
first drop site: <addr>

It prints both drop sites — this one and the one it stashed the first time — because for a double-free the question is never “where was it freed?” but “where were the two places that each thought they owned it?”

The baseline: failing only on what’s new

A bug you have already filed shouldn’t paint every future run red. tools/crucible-known.txt is the baseline: one signature per line, with # comments allowed. A signature listed there is reported as [known] and does not fail the run; only a signature that is not listed counts as NEW and flips the exit code to 1. The discipline, stated in the file’s own header, is to remove a line only when the bug is actually fixed and a clean Crucible run confirms it. As of this writing the file is empty: the two erased-generic double-free bugs that once lived there are fixed on both backends, and Crucible is green.

Working a finding

When make crucible hands you a NEW finding, the loop is short:

Open the minimal repro under tools/crucible-finds/. It is a real .ig file, already as small as the tool could make it.

Reproduce it by hand under the oracle that fired, so you can watch it happen. For a memory fault, reach for the tape — build the ASan + drop-trace compiler and run the repro under it:

make asan-trace
ASAN_OPTIONS=detect_leaks=0 build/inglec-trace --emit=run tools/crucible-finds/find2_diff_VM_ne_native.ig

For a diff:VM-ne-native, run both backends and compare the answers directly:

inglec --emit=run repro.ig                  # the reference answer (the VM)
inglec -o /tmp/repro repro.ig && /tmp/repro # the native answer

Fix the bug, then add a regression test under tests/ so it stays fixed — a fix without a test that exercises it isn’t finished.
Confirm green. Remove the signature from crucible-known.txt if it was baselined, and run make crucible again until it reports 0 NEW.

That is the same discipline the rest of Part IV asks for — reproduce on the smallest possible evidence, fix, lock it in with a test — applied to the one category of correctness a contract cannot describe: what the runtime does with your memory.

Fireside trivia. The double-drop detector’s sentinel is the 32-bit value 0x5EAD5EAD. Squint at it: 5EAD5EAD reads as “dead dead” — a value freed once is dead, and a value freed twice is dead twice. The pool never touches an object’s refcount field after reclaiming it, and a genuine re-allocation clears it, so if the runtime is about to free an object and finds 0x5EAD5EAD still sitting in that field, it knows for certain it’s staring at the same corpse a second time. A good sentinel is a number that could never arise by accident and that tells you what it means the moment you read it in a debugger at two in the morning.