Skip to main content
jeff.leung
Case study 01 · Updated 2026-05-27

AI-Assisted Delivery Reliability System

An engineering system that turns AI-assisted software development into reviewable, repeatable delivery — explicit plans, scoped agent tools, verification gates, and sanitized receipts.

AI-assisted engineering Backend / Platform Verification Developer experience
Problem

Treating coding agents as freeform assistants produces work that is fast to generate but expensive to review. Reviewers cannot trust outputs they cannot trace, and engineering leaders cannot ship AI-assisted work into production without explicit guardrails around scope, tools, and verification.

Constraints
  • Must work across multiple coding-agent hosts without per-host rewrites of the underlying engineering process.
  • Must preserve human review authority — agents propose, humans approve.
  • Must produce sanitizable artifacts so privacy-sensitive work can still be talked about externally.
  • Must run inside a single engineer's workstation without a centralized server.
Architecture

The reliability system sits between an engineer’s intent and a coding agent’s output. It treats each task as a unit of work with a written plan, a scoped toolset, a verification step, and an artifact at the end. The goal is not to suppress the agent; it is to make the agent’s contribution legible.

Phases of a reliable session

  1. Intake. The engineer states what they want and what they explicitly do not want. The system records this verbatim as the scope.
  2. Plan-master pass. A planning step turns the intent into a written plan with files to touch, sequencing, assumptions, and a confidence declaration. The plan is the contract for the rest of the session.
  3. Scoped tools. The agent executes only through typed tools whose argument shapes are inspectable. Each invocation is logged.
  4. Implementation. The agent works task-by-task. It is allowed to ask clarifying questions when the plan diverges from reality.
  5. Verification gates. Builds, type checks, smoke tests, and lint runs are required before the system marks anything complete. Output is captured, not summarized.
  6. Work receipt. A structured record — task, plan, diff, verification output, sanitized summary — is written to the receipts ledger.
  7. Reviewable output. The receipt becomes the basis for human review and, when appropriate, external write-ups like this case study.

How sanitization is enforced

Sanitization is part of the receipt schema, not an afterthought. Each receipt records whether its summary is publication-ready, what was redacted, and which evidence modules (diagrams, snapshots, callouts) are safe to surface externally. This case study itself was produced by following that gate.

Reliability mechanisms
  • Plan-first contract — every non-trivial task produces a written plan, with explicit scope, non-goals, assumptions, and a confidence declaration before execution begins.
  • Scoped tooling — agents act through narrowly typed tools (file edits, build, test, shell) rather than open-ended capabilities, so each step is auditable.
  • Verification before completion — work is only claimed complete after a verification command (build, type-check, smoke test) has been run and its output recorded.
  • Sanitized work receipts — each session emits a structured artifact (task, plan, diff, verification result, sanitized summary) suitable for review and selective external sharing.
  • Reduced-motion fallback for review — engineers can read receipts as static markdown without depending on live agent state.
Tradeoffs
  • Plans add latency to small tasks; the system absorbs this by reusing plans for recurring work.
  • Strict verification gates can block on flaky tests; the answer is to fix the gate, not to bypass it.
  • Cross-host portability means leaning on conventions (skills, sub-agents, MDX-style specs) rather than any single vendor's UX.
What this proves

Demonstrates that AI-assisted software work can be held to the same review standards as traditional engineering — with plans you can read, tools you can audit, verification you can rerun, and receipts you can share without leaking private context.

Privacy notes

All language in this case study uses domain-safe labels. No employer, client, repository, package, or hostname is named. Examples are abstractions of patterns implemented across multiple projects.