Building with an agentic harness

Hard problem

A frontier model gives one person a team's throughput — but not a team's trustworthiness. Raw model output can't be trusted with a product whose core is an audit-grade, multi-chain ledger.

Approach

Run the engineering the way the ledger runs: plans, decisions, and remembered gotchas are the immutable inputs; an agentic harness derives the work from them; and every finding is adversarially verified before it lands. The mutable output — code, copy, config — can always be reconciled back to the record that produced it.

The demos on this site share one idea: don't store the artifact, derive it. Declare the inputs and the rules, and let a generator produce the ledger — every row reproducible, every row carrying the input and rule that made it. The deterministic-accounting post puts that plainly for a crypto ledger.

This site, and the accounting product behind it, are built the same way one level up — applied not to a ledger, but to the engineering.

The inputs are intent: plans, decisions, and a persistent memory of the gotchas that bit me last time. The generator is an agentic harness that turns that intent into code. The discipline is the same as the ledger's: the trail that produced the work is kept, so a change to intent regenerates the work rather than patching it by hand.

The method

The harness is a loop, not a chatbot. A task is decomposed, fanned out across independent agents that each hold only their slice of context, and — the part that matters — every finding is adversarially verified before it is allowed to land. Cheap, parallel skepticism is what keeps one person's output honest.

async function run(task: Intent) {
	const plan = await decompose(task)              // intent -> work-list
	const found = await parallel(                   // fan out, each agent blind
		plan.map((unit) => () => investigate(unit)),  //   to the others
	)
	const real = await parallel(
		found.map((f) => () => verify(f)),            // refute by default;
	).then((v) => v.filter((x) => x.survivesRefutation)) // keep what survives
	persist(real, memory)                           // gotchas outlive the task
	return real
}

Underneath that loop sit local AI pipelines doing the CPU-bound, unglamorous work — transcription, OCR, full-text dedup over a large corpus — so a single person can hold a lot of state correct at once. Web workers for the parsing, local cache, no server round-trip. The same systems work that makes a ledger trustworthy at scale makes the harness trustworthy at scale.

Three things this is not.

It is not autonomy. A human still owns correctness and judgment — the harness is leverage, not a replacement for either. The cases that silently break a ledger (a stake that's legally a disposal, internal transfers, a chain reorg) get caught because someone decided they matter, not because a model volunteered them.

It is not the leverage itself — that's Claude's. What's built here is the part the model doesn't come with: the verification and provenance discipline that makes its output trustworthy.

And it is not the product. The product is the accounting. The throughput comes from the model — anyone with Claude gets that. The harness is what makes that throughput safe to point at an audit-grade, multi-chain ledger: nothing lands unverified, and everything that lands traces back to the intent that produced it.

I intend to keep developing this article — what the pipelines actually are, where the harness earns its keep and where it doesn't — explanations are to come.