paiml · noahgift · May 20, 2026 · May 20, 2026
diff --git a/evidence/phase-6/1.5b-calibration-run.md b/evidence/phase-6/1.5b-calibration-run.md
@@ -97,13 +97,16 @@ The compound effect of Option A + B in the squashed commit:
 
 - [`evidence/under-contract/`](../under-contract/) — M270 treatment evidence
 - [`evidence/under-contract-control/`](../under-contract-control/) — M280 control evidence (in flight)
-- [aprender#1789](https://github.com/paiml/aprender/issues/1789) — Qwen3-MoE F32 routing deep fix. **CLOSED at code level 2026-05-19 via squash `9c974524f` + `25527499c`**. V1_004 still BLOCKED on M32d KV cache work — see [30b-moe-empirical-2026-05-19.md](30b-moe-empirical-2026-05-19.md).
+- [aprender#1789](https://github.com/paiml/aprender/issues/1789) — Qwen3-MoE F32 routing deep fix. **CLOSED at code level 2026-05-19**. V1_004 prerequisite **MET 2026-05-20 via M32d** (paiml/aprender#1832) — see [m32d-shipped-2026-05-20.md](m32d-shipped-2026-05-20.md). Bench discharge pending operator dispatch.
 - [aprender#1806](https://github.com/paiml/aprender/pull/1806) — Option A: arch-guard + scope doc + provable contract. **MERGED**.
 - [aprender#1807](https://github.com/paiml/aprender/pull/1807) — Option B: full MoE dispatch via run_qwen3_moe_generate. **MERGED** into #1806's branch.
 - [aprender#1812](https://github.com/paiml/aprender/pull/1812) — Option B follow-up: apr-cli serve mapped_gguf_model wire + configurable HTTP timeout. **MERGED**.
-- [aprender#1814](https://github.com/paiml/aprender/pull/1814) — max_tokens cap env-configurable. **OPEN** (CI green, post-rebase merge pending).
-- [30b-moe-empirical-2026-05-19.md](30b-moe-empirical-2026-05-19.md) — 5-run timeout-class progression evidence + V1_004 blocking analysis
-- [aprender qwen3-moe-serve-dispatch-fix.md](https://github.com/paiml/aprender/blob/main/docs/specifications/qwen3-moe-serve-dispatch-fix.md) — upstream scope doc (now on main)
-- [aprender qwen3-moe-serve-dispatch-v1.yaml](https://github.com/paiml/aprender/blob/main/contracts/qwen3-moe-serve-dispatch-v1.yaml) — upstream provable contract v1.1.0 (now on main)
+- [aprender#1814](https://github.com/paiml/aprender/pull/1814) — max_tokens cap env-configurable. **MERGED**.
+- [aprender#1819](https://github.com/paiml/aprender/pull/1819) — V1_001 + V1_003 integration test. **MERGED**.
+- [aprender#1826](https://github.com/paiml/aprender/pull/1826) — M32d scope doc. **MERGED**.
+- [aprender#1832](https://github.com/paiml/aprender/pull/1832) — M32d KV cache implementation (19× speedup; 9.62 tok/s sustained). **OPEN, in CI**.
+- [30b-moe-empirical-2026-05-19.md](30b-moe-empirical-2026-05-19.md) — 5-run timeout-class progression evidence + V1_004 blocking analysis (pre-M32d)
+- [m32d-shipped-2026-05-20.md](m32d-shipped-2026-05-20.md) — M32d empirical + V1_004 dispatch readiness checklist (post-M32d)
+- [aprender qwen3-moe-serve-dispatch-v1.yaml](https://github.com/paiml/aprender/blob/main/contracts/qwen3-moe-serve-dispatch-v1.yaml) — upstream provable contract (v1.1.1 on main; v1.2.0 in #1832 once merged)
 - [phase-6-results-and-next-steps.md](../../docs/specifications/phase-6-results-and-next-steps.md) — M278 synthesis doc
 - [phase-6-design-audit.md § 4](../../docs/specifications/phase-6-design-audit.md) — Popperian falsifier framing the 0/0 ratio resolves
diff --git a/evidence/phase-6/m32d-shipped-2026-05-20.md b/evidence/phase-6/m32d-shipped-2026-05-20.md
@@ -0,0 +1,73 @@
+# M32d shipped — KV cache for qwen3_moe path; V1_004 dispatch ready
+
+[Top spec](../../docs/specifications/claude-code-parity-apr-poc.md) | [Phase 6 plan](../../docs/specifications/phase-6-under-contract-bench-plan.md) | [1.5B calibration](1.5b-calibration-run.md) | [30B-MoE timeout evidence](30b-moe-empirical-2026-05-19.md)
+
+**Status (2026-05-20, M286)**: Upstream **M32d KV cache for qwen3_moe inference path SHIPPED** at paiml/aprender#1832 (open; in CI). Operator flipped from Option (b) engineer-driven to Option (a) in-session implementation; delivered as one PR with 19× speedup empirically validated. V1_004 prerequisite met. Bench dispatch now operator-actionable on a tractable (~10hr) wall.
+
+## Upstream M32d empirical results
+
+On Qwen3-Coder-30B-A3B-Instruct-Q4_K_M (the bench's target model):
+
+| Metric | Pre-M32d | Post-M32d | Speedup |
+|---|---|---|---|
+| Sustained throughput (32-token gen, 9-token prompt) | ~0.5 tok/s | **9.62 tok/s** | 19× |
+| Wall on 4-token gen | 1002ms | 553ms | 1.8× |
+| Numerical equivalence vs full-prefill (greedy) | — | byte-identical | ✓ |
+| V1_001 + V1_003 smoke (#1819 test) | 7.84s | 9.39s | stable |
+
+Two new cargo tests pin the invariants in CI (env-gated on `QWEN3_MOE_GGUF_PATH`, `#[ignore]`'d by default):
+
+- `crates/aprender-serve/tests/moe_kv_cache_equivalence.rs` — generates 4 tokens via M32d cache-on AND legacy full-prefill loop; asserts greedy outputs byte-identical
+- `crates/aprender-serve/tests/m32d_perf.rs` — asserts ≥ 5 tok/s sustained on 32-token gen (floor pinned via `M32D_TPS_FLOOR` constant)
+
+## V1_004 dispatch readiness
+
+The Phase 6 bench (`scripts/phase-6-bench.sh`) is unchanged from the M280 + M284 work. With M32d landed, the dispatch is operator-actionable:
+
+```bash
+APR_MODEL=/home/noah/models/Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf \
+PHASE6_COMPLIANCE_ENFORCED=1 \
+PHASE6_MAX_TURNS=20 \
+PHASE6_WALL_SECONDS=3600 \
+APR_TIMEOUT_S=900 \
+APR_AGENT_HTTP_TIMEOUT_S=1500 \
+APR_AGENT_MAX_TOKENS_CAP=1024 \
+bash scripts/phase-6-bench.sh 2>&1 | tee /tmp/phase-6-30b-post-m32d-treatment.log
+```
+
+**Pre-dispatch checklist** (operator-coordinated):
+
+1. ✅ M32d PR (#1832) MERGED into aprender main
+2. ✅ apr binary rebuilt from main + installed at `/home/noah/.local/bin/apr`
+3. ✅ Verify version: `apr --version` shows a hash newer than `e58925095` (#1814)
+4. ✅ Smoke check: `apr serve run /home/noah/models/Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf --port 19999 --host 127.0.0.1 --gpu`, POST a small chat request, verify ≥ 5 tok/s
+5. ✅ Run treatment bench (cmd above). Wall: ~10 hours estimated at 9.62 tok/s × 20 fixtures.
+6. ✅ Run control bench: same command with `PHASE6_COMPLIANCE_ENFORCED=0`, `tee /tmp/phase-6-30b-post-m32d-control.log`. Wall: ~10 hours.
+7. ✅ Both scores.json files land at `evidence/under-contract/scores.json` (treatment) + `evidence/under-contract-control/scores.json` (control).
+
+Acceptance: `student_pass_rate > 0` in EITHER scores.json discharges V1_004.
+
+## Expected outcome
+
+Given Qwen3-Coder-30B-A3B's capacity (much higher than the 1.5B baseline), under-contract dispatch SHOULD produce non-zero student pass rate on at least some leetcode / unix-utility fixtures. The exact pass rate is unknown until the bench runs — that's the measurement.
+
+The pair (treatment, control) lets the analyzer compute the meaningful `compliance_cost_ratio`:
+
+- `student_pass_rate(treatment)` — pass rate when per-turn `pmat comply check --strict` is enforced
+- `student_pass_rate(control)` — pass rate when compliance is OFF
+- Ratio `= treatment / control` indicates the cost of the contract discipline on a STILL-CAPABLE student (vs the M280 1.5B floor where both were 0)
+
+## What's NOT in this M286 doc
+
+- No new contract gates (V1_004 is unchanged; only its PREREQUISITE status flipped from BLOCKED to MET)
+- No new CCPA-side code (the bench script + analyzer + harness all unchanged)
+- No bench dispatch itself (operator-coordinated, ~10hr wall)
+
+## Cross-references
+
+- [aprender#1832](https://github.com/paiml/aprender/pull/1832) — M32d KV cache implementation (open; in CI as of 2026-05-20)
+- [aprender#1829](https://github.com/paiml/aprender/pull/1829) — superseded by #1832 (closed)
+- [`docs/specifications/m32d-moe-kv-cache-scope.md`](https://github.com/paiml/aprender/blob/main/docs/specifications/m32d-moe-kv-cache-scope.md) — scope + engineer playbook (referenced for historical context)
+- [`contracts/qwen3-moe-serve-dispatch-v1.yaml`](https://github.com/paiml/aprender/blob/main/contracts/qwen3-moe-serve-dispatch-v1.yaml) v1.2.0 — V1_004 `prerequisite_status: MET`
+- [`evidence/phase-6/30b-moe-empirical-2026-05-19.md`](30b-moe-empirical-2026-05-19.md) — pre-M32d timeout-class evidence (5 dispatches)
+- [`evidence/phase-6/1.5b-calibration-run.md`](1.5b-calibration-run.md) — M270/M280 1.5B baseline