evidence(distill): Phase 3 F-DISTILL-SMOKE-001 PASS on gx10 GB10 by noahgift · Pull Request #1828 · paiml/aprender

noahgift · 2026-05-20T05:09:40Z

🎉 F-DISTILL-SMOKE-001 DISCHARGED

Real distillation 1.5B Qwen2.5-Coder teacher → 0.5B Qwen2.5-Coder student on Blackwell GB10 (sm_121):

initial_loss = 7.6746
final_loss   = 7.2036   ← LESS THAN initial
62 steps, 122.7s, no errors

Phase 3 of SPEC-DISTILL-001 is COMPLETE.

What this proves

End-to-end on Blackwell with the full cascade:

✅ Teacher load (1.5B Qwen → 28 transformer blocks)
✅ Student load (0.5B Qwen → 24 transformer blocks)
✅ Forward pass (cuBLAS + pre-warmed PTX)
✅ KD loss computation (kd_step + DistillationLoss)
✅ Backward pass (no JIT-mid-training stream poisoning)
✅ Optimizer step (gradient accumulation + AdamW)
✅ Multi-step convergence (loss decreasing)
✅ Output checkpoint written (student-trained.apr/model.safetensors)

Cascade landed

#	PR	What
1	#1804 PMAT-700-B	cuBLAS prewarm skip
2	#1808 PMAT-698e	workspace cap (2048)
3	#1809 PMAT-698f	APR magic in weights loader
4	#1810 PMAT-698g	non-LoRA backward pre-warm
5	#1813 PMAT-698h	rms_norm_gamma_reduce pre-warm
6	#1815 PMAT-698i	FWD-CACHE diagnostic logging
7	#1817 PMAT-698j	THE root cause — warm! macro key
8	#1820 PMAT-698k	cache-key alignment (rope fwd, rmsnorm eps)
9	#1823 PMAT-698m	smoke setup non-degenerate batch
10	#1824	post-mortem doc
11	#1827 PMAT-698n	rmsnorm pre-warm at 1e-6 + 1e-5

Test plan

Evidence-only PR; the actual code changes already landed across the 11 PRs above. This PR captures the proof-of-success log + dispatch manifest for posterity.

🤖 Generated with Claude Code

2026-05-20 — real distillation 1.5B teacher → 0.5B student on Blackwell GB10 with the full PMAT-698e..n + PMAT-700-B cascade active. initial_loss = 7.6746 final_loss = 7.2036 ← LESS THAN initial 62 steps, 122.7s, no errors F-DISTILL-SMOKE-001 ("final_loss < initial_loss") discharged. Phase 3 of SPEC-DISTILL-001 is COMPLETE. Evidence: - evidence/distill-phase-3-real-kd/dispatch.json — dispatch manifest - evidence/distill-phase-3-real-kd/launch-final-pass.txt — full training log Run dir on gx10: /home/noah/runs/distill-smoke-20260520-070404/ Trained student checkpoint: student-trained.apr/model.safetensors Cascade summary (all merged): - #1804 PMAT-700-B (cuBLAS prewarm skip) - #1808 PMAT-698e (workspace cap) - #1809 PMAT-698f (APR magic in weights loader) - #1810 PMAT-698g (non-LoRA backward pre-warm) - #1813 PMAT-698h (rms_norm_gamma_reduce pre-warm) - #1815 PMAT-698i (FWD-CACHE diagnostic logging) - #1817 PMAT-698j (THE root cause — warm! macro key) - #1820 PMAT-698k (cache-key alignment: rope fwd + rmsnorm eps) - #1823 PMAT-698m (smoke setup: non-degenerate batch) - #1824 (post-mortem doc) - #1827 PMAT-698n (rmsnorm pre-warm at both 1e-6 + 1e-5 eps) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 20, 2026 05:09

noahgift added 3 commits May 20, 2026 08:19

Merge branch 'main' into evidence/distill-phase-3-victory

b2134b8

Merge branch 'main' into evidence/distill-phase-3-victory

2389b5a

Merge branch 'main' into evidence/distill-phase-3-victory

cbcb00e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evidence(distill): Phase 3 F-DISTILL-SMOKE-001 PASS on gx10 GB10#1828

evidence(distill): Phase 3 F-DISTILL-SMOKE-001 PASS on gx10 GB10#1828
noahgift wants to merge 4 commits into
mainfrom
evidence/distill-phase-3-victory

noahgift commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 20, 2026

🎉 F-DISTILL-SMOKE-001 DISCHARGED

What this proves

Cascade landed

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant