Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions evidence/phase-6/1.5b-calibration-run.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,13 +97,16 @@ The compound effect of Option A + B in the squashed commit:

- [`evidence/under-contract/`](../under-contract/) — M270 treatment evidence
- [`evidence/under-contract-control/`](../under-contract-control/) — M280 control evidence (in flight)
- [aprender#1789](https://github.com/paiml/aprender/issues/1789) — Qwen3-MoE F32 routing deep fix. **CLOSED at code level 2026-05-19 via squash `9c974524f` + `25527499c`**. V1_004 still BLOCKED on M32d KV cache work — see [30b-moe-empirical-2026-05-19.md](30b-moe-empirical-2026-05-19.md).
- [aprender#1789](https://github.com/paiml/aprender/issues/1789) — Qwen3-MoE F32 routing deep fix. **CLOSED at code level 2026-05-19**. V1_004 prerequisite **MET 2026-05-20 via M32d** (paiml/aprender#1832) — see [m32d-shipped-2026-05-20.md](m32d-shipped-2026-05-20.md). Bench discharge pending operator dispatch.
- [aprender#1806](https://github.com/paiml/aprender/pull/1806) — Option A: arch-guard + scope doc + provable contract. **MERGED**.
- [aprender#1807](https://github.com/paiml/aprender/pull/1807) — Option B: full MoE dispatch via run_qwen3_moe_generate. **MERGED** into #1806's branch.
- [aprender#1812](https://github.com/paiml/aprender/pull/1812) — Option B follow-up: apr-cli serve mapped_gguf_model wire + configurable HTTP timeout. **MERGED**.
- [aprender#1814](https://github.com/paiml/aprender/pull/1814) — max_tokens cap env-configurable. **OPEN** (CI green, post-rebase merge pending).
- [30b-moe-empirical-2026-05-19.md](30b-moe-empirical-2026-05-19.md) — 5-run timeout-class progression evidence + V1_004 blocking analysis
- [aprender qwen3-moe-serve-dispatch-fix.md](https://github.com/paiml/aprender/blob/main/docs/specifications/qwen3-moe-serve-dispatch-fix.md) — upstream scope doc (now on main)
- [aprender qwen3-moe-serve-dispatch-v1.yaml](https://github.com/paiml/aprender/blob/main/contracts/qwen3-moe-serve-dispatch-v1.yaml) — upstream provable contract v1.1.0 (now on main)
- [aprender#1814](https://github.com/paiml/aprender/pull/1814) — max_tokens cap env-configurable. **MERGED**.
- [aprender#1819](https://github.com/paiml/aprender/pull/1819) — V1_001 + V1_003 integration test. **MERGED**.
- [aprender#1826](https://github.com/paiml/aprender/pull/1826) — M32d scope doc. **MERGED**.
- [aprender#1832](https://github.com/paiml/aprender/pull/1832) — M32d KV cache implementation (19× speedup; 9.62 tok/s sustained). **OPEN, in CI**.
- [30b-moe-empirical-2026-05-19.md](30b-moe-empirical-2026-05-19.md) — 5-run timeout-class progression evidence + V1_004 blocking analysis (pre-M32d)
- [m32d-shipped-2026-05-20.md](m32d-shipped-2026-05-20.md) — M32d empirical + V1_004 dispatch readiness checklist (post-M32d)
- [aprender qwen3-moe-serve-dispatch-v1.yaml](https://github.com/paiml/aprender/blob/main/contracts/qwen3-moe-serve-dispatch-v1.yaml) — upstream provable contract (v1.1.1 on main; v1.2.0 in #1832 once merged)
- [phase-6-results-and-next-steps.md](../../docs/specifications/phase-6-results-and-next-steps.md) — M278 synthesis doc
- [phase-6-design-audit.md § 4](../../docs/specifications/phase-6-design-audit.md) — Popperian falsifier framing the 0/0 ratio resolves
73 changes: 73 additions & 0 deletions evidence/phase-6/m32d-shipped-2026-05-20.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# M32d shipped — KV cache for qwen3_moe path; V1_004 dispatch ready

[Top spec](../../docs/specifications/claude-code-parity-apr-poc.md) | [Phase 6 plan](../../docs/specifications/phase-6-under-contract-bench-plan.md) | [1.5B calibration](1.5b-calibration-run.md) | [30B-MoE timeout evidence](30b-moe-empirical-2026-05-19.md)

**Status (2026-05-20, M286)**: Upstream **M32d KV cache for qwen3_moe inference path SHIPPED** at paiml/aprender#1832 (open; in CI). Operator flipped from Option (b) engineer-driven to Option (a) in-session implementation; delivered as one PR with 19× speedup empirically validated. V1_004 prerequisite met. Bench dispatch now operator-actionable on a tractable (~10hr) wall.

## Upstream M32d empirical results

On Qwen3-Coder-30B-A3B-Instruct-Q4_K_M (the bench's target model):

| Metric | Pre-M32d | Post-M32d | Speedup |
|---|---|---|---|
| Sustained throughput (32-token gen, 9-token prompt) | ~0.5 tok/s | **9.62 tok/s** | 19× |
| Wall on 4-token gen | 1002ms | 553ms | 1.8× |
| Numerical equivalence vs full-prefill (greedy) | — | byte-identical | ✓ |
| V1_001 + V1_003 smoke (#1819 test) | 7.84s | 9.39s | stable |

Two new cargo tests pin the invariants in CI (env-gated on `QWEN3_MOE_GGUF_PATH`, `#[ignore]`'d by default):

- `crates/aprender-serve/tests/moe_kv_cache_equivalence.rs` — generates 4 tokens via M32d cache-on AND legacy full-prefill loop; asserts greedy outputs byte-identical
- `crates/aprender-serve/tests/m32d_perf.rs` — asserts ≥ 5 tok/s sustained on 32-token gen (floor pinned via `M32D_TPS_FLOOR` constant)

## V1_004 dispatch readiness

The Phase 6 bench (`scripts/phase-6-bench.sh`) is unchanged from the M280 + M284 work. With M32d landed, the dispatch is operator-actionable:

```bash
APR_MODEL=/home/noah/models/Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf \
PHASE6_COMPLIANCE_ENFORCED=1 \
PHASE6_MAX_TURNS=20 \
PHASE6_WALL_SECONDS=3600 \
APR_TIMEOUT_S=900 \
APR_AGENT_HTTP_TIMEOUT_S=1500 \
APR_AGENT_MAX_TOKENS_CAP=1024 \
bash scripts/phase-6-bench.sh 2>&1 | tee /tmp/phase-6-30b-post-m32d-treatment.log
```

**Pre-dispatch checklist** (operator-coordinated):

1. ✅ M32d PR (#1832) MERGED into aprender main
2. ✅ apr binary rebuilt from main + installed at `/home/noah/.local/bin/apr`
3. ✅ Verify version: `apr --version` shows a hash newer than `e58925095` (#1814)
4. ✅ Smoke check: `apr serve run /home/noah/models/Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf --port 19999 --host 127.0.0.1 --gpu`, POST a small chat request, verify ≥ 5 tok/s
5. ✅ Run treatment bench (cmd above). Wall: ~10 hours estimated at 9.62 tok/s × 20 fixtures.
6. ✅ Run control bench: same command with `PHASE6_COMPLIANCE_ENFORCED=0`, `tee /tmp/phase-6-30b-post-m32d-control.log`. Wall: ~10 hours.
7. ✅ Both scores.json files land at `evidence/under-contract/scores.json` (treatment) + `evidence/under-contract-control/scores.json` (control).

Acceptance: `student_pass_rate > 0` in EITHER scores.json discharges V1_004.

## Expected outcome

Given Qwen3-Coder-30B-A3B's capacity (much higher than the 1.5B baseline), under-contract dispatch SHOULD produce non-zero student pass rate on at least some leetcode / unix-utility fixtures. The exact pass rate is unknown until the bench runs — that's the measurement.

The pair (treatment, control) lets the analyzer compute the meaningful `compliance_cost_ratio`:

- `student_pass_rate(treatment)` — pass rate when per-turn `pmat comply check --strict` is enforced
- `student_pass_rate(control)` — pass rate when compliance is OFF
- Ratio `= treatment / control` indicates the cost of the contract discipline on a STILL-CAPABLE student (vs the M280 1.5B floor where both were 0)

## What's NOT in this M286 doc

- No new contract gates (V1_004 is unchanged; only its PREREQUISITE status flipped from BLOCKED to MET)
- No new CCPA-side code (the bench script + analyzer + harness all unchanged)
- No bench dispatch itself (operator-coordinated, ~10hr wall)

## Cross-references

- [aprender#1832](https://github.com/paiml/aprender/pull/1832) — M32d KV cache implementation (open; in CI as of 2026-05-20)
- [aprender#1829](https://github.com/paiml/aprender/pull/1829) — superseded by #1832 (closed)
- [`docs/specifications/m32d-moe-kv-cache-scope.md`](https://github.com/paiml/aprender/blob/main/docs/specifications/m32d-moe-kv-cache-scope.md) — scope + engineer playbook (referenced for historical context)
- [`contracts/qwen3-moe-serve-dispatch-v1.yaml`](https://github.com/paiml/aprender/blob/main/contracts/qwen3-moe-serve-dispatch-v1.yaml) v1.2.0 — V1_004 `prerequisite_status: MET`
- [`evidence/phase-6/30b-moe-empirical-2026-05-19.md`](30b-moe-empirical-2026-05-19.md) — pre-M32d timeout-class evidence (5 dispatches)
- [`evidence/phase-6/1.5b-calibration-run.md`](1.5b-calibration-run.md) — M270/M280 1.5B baseline
Loading