diff --git a/evidence/phase-6/1.5b-calibration-run.md b/evidence/phase-6/1.5b-calibration-run.md index fcbd379..1187e09 100644 --- a/evidence/phase-6/1.5b-calibration-run.md +++ b/evidence/phase-6/1.5b-calibration-run.md @@ -97,13 +97,16 @@ The compound effect of Option A + B in the squashed commit: - [`evidence/under-contract/`](../under-contract/) — M270 treatment evidence - [`evidence/under-contract-control/`](../under-contract-control/) — M280 control evidence (in flight) -- [aprender#1789](https://github.com/paiml/aprender/issues/1789) — Qwen3-MoE F32 routing deep fix. **CLOSED at code level 2026-05-19 via squash `9c974524f` + `25527499c`**. V1_004 still BLOCKED on M32d KV cache work — see [30b-moe-empirical-2026-05-19.md](30b-moe-empirical-2026-05-19.md). +- [aprender#1789](https://github.com/paiml/aprender/issues/1789) — Qwen3-MoE F32 routing deep fix. **CLOSED at code level 2026-05-19**. V1_004 prerequisite **MET 2026-05-20 via M32d** (paiml/aprender#1832) — see [m32d-shipped-2026-05-20.md](m32d-shipped-2026-05-20.md). Bench discharge pending operator dispatch. - [aprender#1806](https://github.com/paiml/aprender/pull/1806) — Option A: arch-guard + scope doc + provable contract. **MERGED**. - [aprender#1807](https://github.com/paiml/aprender/pull/1807) — Option B: full MoE dispatch via run_qwen3_moe_generate. **MERGED** into #1806's branch. - [aprender#1812](https://github.com/paiml/aprender/pull/1812) — Option B follow-up: apr-cli serve mapped_gguf_model wire + configurable HTTP timeout. **MERGED**. -- [aprender#1814](https://github.com/paiml/aprender/pull/1814) — max_tokens cap env-configurable. **OPEN** (CI green, post-rebase merge pending). -- [30b-moe-empirical-2026-05-19.md](30b-moe-empirical-2026-05-19.md) — 5-run timeout-class progression evidence + V1_004 blocking analysis -- [aprender qwen3-moe-serve-dispatch-fix.md](https://github.com/paiml/aprender/blob/main/docs/specifications/qwen3-moe-serve-dispatch-fix.md) — upstream scope doc (now on main) -- [aprender qwen3-moe-serve-dispatch-v1.yaml](https://github.com/paiml/aprender/blob/main/contracts/qwen3-moe-serve-dispatch-v1.yaml) — upstream provable contract v1.1.0 (now on main) +- [aprender#1814](https://github.com/paiml/aprender/pull/1814) — max_tokens cap env-configurable. **MERGED**. +- [aprender#1819](https://github.com/paiml/aprender/pull/1819) — V1_001 + V1_003 integration test. **MERGED**. +- [aprender#1826](https://github.com/paiml/aprender/pull/1826) — M32d scope doc. **MERGED**. +- [aprender#1832](https://github.com/paiml/aprender/pull/1832) — M32d KV cache implementation (19× speedup; 9.62 tok/s sustained). **OPEN, in CI**. +- [30b-moe-empirical-2026-05-19.md](30b-moe-empirical-2026-05-19.md) — 5-run timeout-class progression evidence + V1_004 blocking analysis (pre-M32d) +- [m32d-shipped-2026-05-20.md](m32d-shipped-2026-05-20.md) — M32d empirical + V1_004 dispatch readiness checklist (post-M32d) +- [aprender qwen3-moe-serve-dispatch-v1.yaml](https://github.com/paiml/aprender/blob/main/contracts/qwen3-moe-serve-dispatch-v1.yaml) — upstream provable contract (v1.1.1 on main; v1.2.0 in #1832 once merged) - [phase-6-results-and-next-steps.md](../../docs/specifications/phase-6-results-and-next-steps.md) — M278 synthesis doc - [phase-6-design-audit.md § 4](../../docs/specifications/phase-6-design-audit.md) — Popperian falsifier framing the 0/0 ratio resolves diff --git a/evidence/phase-6/m32d-shipped-2026-05-20.md b/evidence/phase-6/m32d-shipped-2026-05-20.md new file mode 100644 index 0000000..b892492 --- /dev/null +++ b/evidence/phase-6/m32d-shipped-2026-05-20.md @@ -0,0 +1,73 @@ +# M32d shipped — KV cache for qwen3_moe path; V1_004 dispatch ready + +[Top spec](../../docs/specifications/claude-code-parity-apr-poc.md) | [Phase 6 plan](../../docs/specifications/phase-6-under-contract-bench-plan.md) | [1.5B calibration](1.5b-calibration-run.md) | [30B-MoE timeout evidence](30b-moe-empirical-2026-05-19.md) + +**Status (2026-05-20, M286)**: Upstream **M32d KV cache for qwen3_moe inference path SHIPPED** at paiml/aprender#1832 (open; in CI). Operator flipped from Option (b) engineer-driven to Option (a) in-session implementation; delivered as one PR with 19× speedup empirically validated. V1_004 prerequisite met. Bench dispatch now operator-actionable on a tractable (~10hr) wall. + +## Upstream M32d empirical results + +On Qwen3-Coder-30B-A3B-Instruct-Q4_K_M (the bench's target model): + +| Metric | Pre-M32d | Post-M32d | Speedup | +|---|---|---|---| +| Sustained throughput (32-token gen, 9-token prompt) | ~0.5 tok/s | **9.62 tok/s** | 19× | +| Wall on 4-token gen | 1002ms | 553ms | 1.8× | +| Numerical equivalence vs full-prefill (greedy) | — | byte-identical | ✓ | +| V1_001 + V1_003 smoke (#1819 test) | 7.84s | 9.39s | stable | + +Two new cargo tests pin the invariants in CI (env-gated on `QWEN3_MOE_GGUF_PATH`, `#[ignore]`'d by default): + +- `crates/aprender-serve/tests/moe_kv_cache_equivalence.rs` — generates 4 tokens via M32d cache-on AND legacy full-prefill loop; asserts greedy outputs byte-identical +- `crates/aprender-serve/tests/m32d_perf.rs` — asserts ≥ 5 tok/s sustained on 32-token gen (floor pinned via `M32D_TPS_FLOOR` constant) + +## V1_004 dispatch readiness + +The Phase 6 bench (`scripts/phase-6-bench.sh`) is unchanged from the M280 + M284 work. With M32d landed, the dispatch is operator-actionable: + +```bash +APR_MODEL=/home/noah/models/Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf \ +PHASE6_COMPLIANCE_ENFORCED=1 \ +PHASE6_MAX_TURNS=20 \ +PHASE6_WALL_SECONDS=3600 \ +APR_TIMEOUT_S=900 \ +APR_AGENT_HTTP_TIMEOUT_S=1500 \ +APR_AGENT_MAX_TOKENS_CAP=1024 \ +bash scripts/phase-6-bench.sh 2>&1 | tee /tmp/phase-6-30b-post-m32d-treatment.log +``` + +**Pre-dispatch checklist** (operator-coordinated): + +1. ✅ M32d PR (#1832) MERGED into aprender main +2. ✅ apr binary rebuilt from main + installed at `/home/noah/.local/bin/apr` +3. ✅ Verify version: `apr --version` shows a hash newer than `e58925095` (#1814) +4. ✅ Smoke check: `apr serve run /home/noah/models/Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf --port 19999 --host 127.0.0.1 --gpu`, POST a small chat request, verify ≥ 5 tok/s +5. ✅ Run treatment bench (cmd above). Wall: ~10 hours estimated at 9.62 tok/s × 20 fixtures. +6. ✅ Run control bench: same command with `PHASE6_COMPLIANCE_ENFORCED=0`, `tee /tmp/phase-6-30b-post-m32d-control.log`. Wall: ~10 hours. +7. ✅ Both scores.json files land at `evidence/under-contract/scores.json` (treatment) + `evidence/under-contract-control/scores.json` (control). + +Acceptance: `student_pass_rate > 0` in EITHER scores.json discharges V1_004. + +## Expected outcome + +Given Qwen3-Coder-30B-A3B's capacity (much higher than the 1.5B baseline), under-contract dispatch SHOULD produce non-zero student pass rate on at least some leetcode / unix-utility fixtures. The exact pass rate is unknown until the bench runs — that's the measurement. + +The pair (treatment, control) lets the analyzer compute the meaningful `compliance_cost_ratio`: + +- `student_pass_rate(treatment)` — pass rate when per-turn `pmat comply check --strict` is enforced +- `student_pass_rate(control)` — pass rate when compliance is OFF +- Ratio `= treatment / control` indicates the cost of the contract discipline on a STILL-CAPABLE student (vs the M280 1.5B floor where both were 0) + +## What's NOT in this M286 doc + +- No new contract gates (V1_004 is unchanged; only its PREREQUISITE status flipped from BLOCKED to MET) +- No new CCPA-side code (the bench script + analyzer + harness all unchanged) +- No bench dispatch itself (operator-coordinated, ~10hr wall) + +## Cross-references + +- [aprender#1832](https://github.com/paiml/aprender/pull/1832) — M32d KV cache implementation (open; in CI as of 2026-05-20) +- [aprender#1829](https://github.com/paiml/aprender/pull/1829) — superseded by #1832 (closed) +- [`docs/specifications/m32d-moe-kv-cache-scope.md`](https://github.com/paiml/aprender/blob/main/docs/specifications/m32d-moe-kv-cache-scope.md) — scope + engineer playbook (referenced for historical context) +- [`contracts/qwen3-moe-serve-dispatch-v1.yaml`](https://github.com/paiml/aprender/blob/main/contracts/qwen3-moe-serve-dispatch-v1.yaml) v1.2.0 — V1_004 `prerequisite_status: MET` +- [`evidence/phase-6/30b-moe-empirical-2026-05-19.md`](30b-moe-empirical-2026-05-19.md) — pre-M32d timeout-class evidence (5 dispatches) +- [`evidence/phase-6/1.5b-calibration-run.md`](1.5b-calibration-run.md) — M270/M280 1.5B baseline