Case study 04 · Updated 2026-05-27

Work Receipts Ledger

An evidence model for reviewable AI-assisted delivery — agent sessions grouped into work items, assigned quality bands, audited against privacy guards, and selectively exported as sanitized snapshots.

Verification Security Developer experience AI-assisted engineering

Problem

Raw productivity dashboards collapse AI-assisted work into vanity counts that hide the question that matters: would I publish this externally? A productive session is not the same as a reviewable one, and a reviewable session is not the same as a publishable one.

Constraints

Must operate over real session logs without leaking private content.
Must distinguish quality bands and publication-readiness, not just counts.
Must produce sanitized exports suitable for external case studies like this one.
Must support privacy guards — claims that would expose internal context block the export.

Architecture

The work receipts ledger is not a dashboard. Dashboards optimize for at-a-glance counts; reviewability optimizes for: would I show this to an engineering leader without a follow-up question? The ledger answers the second question.

Three layers of judgment

Grouping. Agent sessions are grouped by work item, not counted individually. A multi-session refactor is one item, not five.
Quality bands. Each work item is placed into a band based on whether its plan was followed, whether verification passed, and whether a human reviewer signed off.
Privacy guards. Before any export, automated checks compare the proposed snapshot against the sanitization checklist. Failing items are excluded.

What the snapshot includes — and what it does not

The on-site chart shows counts and percentages: how many items fell into each band, what fraction is sanitized-export-ready, how many were rejected by a privacy guard. It does not show titles, session text, or any direct identifier. The underlying JSON file is committed to the repository so the portfolio can render without any private data.

Recent 8 weeks (sanitized)

Exploratory
4253%
Reviewable
2734%
Publishable
1114%

Sanitized exports ready: 11
Privacy-guard rejections: 6

Counts are sanitized. The live ledger is not read at runtime.

Reliability mechanisms

Work-item grouping — sessions that belong to the same task are aggregated, not double-counted.
Quality bands — each work item is assigned a band such as exploratory, reviewable, or publishable based on plan adherence, verification status, and reviewer sign-off.
Privacy guards — automated checks flag content that violates the sanitization checklist before any export is allowed.
Snapshot model — only manually reviewed, sanitized snapshots are exposed to external consumers (such as this portfolio).
Audit trail — every snapshot records which guard checks ran and which items were excluded.

Tradeoffs

Quality bands require human review; pure automation would be faster but less trustworthy.
Privacy guards add friction; that friction is the point.
Snapshots intentionally lag the live ledger; freshness is traded for reviewability.

What this proves

Demonstrates an evidence model that separates productivity from reviewability and reviewability from publication-readiness — and shows what disciplined AI-assisted delivery looks like when measured against the right question.

Privacy notes

The on-site evidence module uses a sanitized snapshot only. The live ledger is never read at runtime. Counts and percentages are derived from a manually reviewed export. No work-item titles or session text are surfaced.

← Back to case studies Start an open conversation