Skip to content

evidence(distill): Phase 3 F-DISTILL-SMOKE-001 PASS on gx10 GB10#1828

Open
noahgift wants to merge 4 commits into
mainfrom
evidence/distill-phase-3-victory
Open

evidence(distill): Phase 3 F-DISTILL-SMOKE-001 PASS on gx10 GB10#1828
noahgift wants to merge 4 commits into
mainfrom
evidence/distill-phase-3-victory

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

🎉 F-DISTILL-SMOKE-001 DISCHARGED

Real distillation 1.5B Qwen2.5-Coder teacher → 0.5B Qwen2.5-Coder student on Blackwell GB10 (sm_121):

initial_loss = 7.6746
final_loss   = 7.2036   ← LESS THAN initial
62 steps, 122.7s, no errors

Phase 3 of SPEC-DISTILL-001 is COMPLETE.

What this proves

End-to-end on Blackwell with the full cascade:

  • ✅ Teacher load (1.5B Qwen → 28 transformer blocks)
  • ✅ Student load (0.5B Qwen → 24 transformer blocks)
  • ✅ Forward pass (cuBLAS + pre-warmed PTX)
  • ✅ KD loss computation (kd_step + DistillationLoss)
  • ✅ Backward pass (no JIT-mid-training stream poisoning)
  • ✅ Optimizer step (gradient accumulation + AdamW)
  • ✅ Multi-step convergence (loss decreasing)
  • ✅ Output checkpoint written (student-trained.apr/model.safetensors)

Cascade landed

# PR What
1 #1804 PMAT-700-B cuBLAS prewarm skip
2 #1808 PMAT-698e workspace cap (2048)
3 #1809 PMAT-698f APR magic in weights loader
4 #1810 PMAT-698g non-LoRA backward pre-warm
5 #1813 PMAT-698h rms_norm_gamma_reduce pre-warm
6 #1815 PMAT-698i FWD-CACHE diagnostic logging
7 #1817 PMAT-698j THE root cause — warm! macro key
8 #1820 PMAT-698k cache-key alignment (rope fwd, rmsnorm eps)
9 #1823 PMAT-698m smoke setup non-degenerate batch
10 #1824 post-mortem doc
11 #1827 PMAT-698n rmsnorm pre-warm at 1e-6 + 1e-5

Test plan

Evidence-only PR; the actual code changes already landed across the 11 PRs above. This PR captures the proof-of-success log + dispatch manifest for posterity.

🤖 Generated with Claude Code

2026-05-20 — real distillation 1.5B teacher → 0.5B student on
Blackwell GB10 with the full PMAT-698e..n + PMAT-700-B cascade active.

  initial_loss = 7.6746
  final_loss   = 7.2036   ← LESS THAN initial
  62 steps, 122.7s, no errors

F-DISTILL-SMOKE-001 ("final_loss < initial_loss") discharged.

Phase 3 of SPEC-DISTILL-001 is COMPLETE.

Evidence:
- evidence/distill-phase-3-real-kd/dispatch.json — dispatch manifest
- evidence/distill-phase-3-real-kd/launch-final-pass.txt — full training log

Run dir on gx10: /home/noah/runs/distill-smoke-20260520-070404/
Trained student checkpoint: student-trained.apr/model.safetensors

Cascade summary (all merged):
- #1804 PMAT-700-B  (cuBLAS prewarm skip)
- #1808 PMAT-698e   (workspace cap)
- #1809 PMAT-698f   (APR magic in weights loader)
- #1810 PMAT-698g   (non-LoRA backward pre-warm)
- #1813 PMAT-698h   (rms_norm_gamma_reduce pre-warm)
- #1815 PMAT-698i   (FWD-CACHE diagnostic logging)
- #1817 PMAT-698j   (THE root cause — warm! macro key)
- #1820 PMAT-698k   (cache-key alignment: rope fwd + rmsnorm eps)
- #1823 PMAT-698m   (smoke setup: non-degenerate batch)
- #1824             (post-mortem doc)
- #1827 PMAT-698n   (rmsnorm pre-warm at both 1e-6 + 1e-5 eps)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) May 20, 2026 05:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant