Phoebe: enforce E3 one-hop + ft-uniqueness, derived round-to-zero guard#16
Phoebe: enforce E3 one-hop + ft-uniqueness, derived round-to-zero guard#16hhuuggoo wants to merge 2 commits into
Conversation
Money-path correctness on the DERIVED fine-tune pricing path, applying the battery findings + Hugo's locked E3 decisions to the edges the first pass missed. All behavioral changes pair with an invariant-named, proven-teeth test. FIX 1 — derived-rate round-to-zero fail-closed guard. The sub-9dp guard only covered FILE-DECLARED rates; a fine-tune PREMIUM that drives base x premium to $0 (factor "0", a tiny fractional factor on a small base, a negative-rounding markup) silently billed every ft: event $0, counted as rated. validateDerivedRatesNonZero now rejects, at LOAD, any premium that zeroes a NONZERO base component (a $0 base component stays a legitimate free rate). Never at bill time. Test: TestLoad_DerivedRateRoundsToZeroFailsClosed. FIX 2 — enforce E3's "ft: ids are globally unique" as a fail-LOUD invariant (not a re-key). rated_usage's key omits base_model, so two ft: deployments with the same model_id but different bases would merge and bill at the MIN (cheaper) rate. E3 mints ft:<checkpoint_artifact_id> as a uuid4, so this can't happen — the rater now DETECTS COUNT(DISTINCT base_model) FILTER (WHERE via_derived) > 1 per rollup, splits that rollup out of the upsert, and counts it as a new ambiguous_base anomaly that drives exit-nonzero. Converts "merges at cheapest rate" into "screams". Fixes the false MIN==MAX comment to state the enforced invariant. Tests: TestRater_FineTuneAmbiguousBaseModelFailsLoud, TestIntegration_FineTuneAmbiguousBaseModelFailsLoud, TestRater_RollupCostSelfAudits (cost == applied_rate x tokens per rollup). FIX 3 — enforce E3's ONE-HOP rule: an own-rate fine-tune is NOT a derivation base. derivedRates() over-projected every pb.base entry (incl. ft: own-rate fine-tunes) as a derivation source, letting a fine-tune derive from another fine-tune's own rate x premium (multi-hop). isDerivationBase() now excludes ft:-prefixed entries and derivedFrom keys; ResolveEvent (the oracle) mirrors it, so SQL and oracle agree. Tests: TestResolveEvent_FineTuneCannotDeriveFromFineTune, TestIntegration_OneHopFineTuneCannotDeriveFromFineTune. Also single-sources the ft: LIKE pattern from the Go fineTunePrefix constant (bound as $3, no literal 'ft:%' in SQL) and makes the deterministic rollup-id md5 input length-prefixed (injective — a '|' in auth_id/model_id can never collide two keys), both inseparable from the rewritten rating statement. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… assert Migration-chain ownership: the rating migration's downgrade() dropped billing_event.base_model, but that column is OWNED by the billing_event create migration (b1f0c2d3e4a5), which declares it directly. A migration must reverse only its OWN additions — dropping base_model on the rating downgrade left the schema diverged from b1f0c2d3e4a5 (column gone while its owning migration is still applied) and would silently destroy fine-tune base linkage. The rating downgrade no longer touches base_model; it is removed only by reversing billing_event's own create. Verified the chain up+down is consistent against live Postgres (base_model survives the rating downgrade; only the billing_event downgrade removes it). e2e fail-closed assertion: TestE2E_FineTuneWithoutBaseModelHeaderIsUnpriced drives the whole pipe with a fine-tune request that arrives WITHOUT the X-Saturn-Base-Model header (the propagation seam broken) and asserts the ft: event ends UNPRICED — an anomaly that screams — never silently rated or $0-billed. The mirror of the happy-path TestE2E_FineTuneBillsAtBaseTimesPremium. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
🔋 Re-battery — Round 1 (status: ESCALATE), WITH decisions-ledgerFewer findings (11 raw vs 19 last round — the ledger correctly suppressed settled E1–E4 items). What's left is genuinely new. Most are mechanical (I'll fix); two are a coherent pair of re-rate-semantics design calls for Hugo. ✅ Mechanical (I'll fix — no decision)
🛑 Two design calls for Hugo — both about re-rate semantics (new; not ledger-settled)The rater is hourly and can be re-run on an hour. These two ask what re-rating should do — a question we haven't pinned: 1. Should re-rate CONVERGE state, or only ever upsert? If a rollup billed clean in run A, then on re-run B becomes ambiguous/unpriced (e.g. new data arrived, or a price changed), it's excluded from the upsert — but the old 2. Mixed known+unknown base under one My leans: (1) reconcile/delete-on-re-rate (matches "what loads is what bills" and your self-auditing-row instinct); (2) strict-poison the whole id (consistent with the fail-loud uniqueness enforcement you already chose). But both change the idempotency/fail-closed contract, so they're yours. Battery |
|
Landed on main via the squashed rating merge (3b22908). Closing the stack. |
Second fix pass on fine-tune pricing, applying the PR #15 battery findings + Hugo's LOCKED E3 decisions to the derived-rate edges the first pass missed. Money path — every behavioral change is fail-closed and paired with an invariant-named, proven-teeth test (each demonstrated RED before its fix).
Stacked on
yaml-rater-fixes.Contracts
Derived-rate fail-closed guard (FIX 1). The round-to-zero guard now covers the DERIVED path, not just file-declared rates. At LOAD, a fine-tune premium that drives a NONZERO base component to a rate that rounds to $0 at 9dp (factor
"0", a tiny fractional factor on a small base, a negative-rounding markup) is REJECTED — naming the base model, premium policy, and component. A $0 base component stays a legitimate free rate; only a premium that silently zeroes an intended-nonzero base is caught. Never at bill time. Was: everyft:event of that base silently billed $0, counted as rated → lost revenue.ft-uniqueness invariant, fail-LOUD (FIX 2). E3 mints
ft:<checkpoint_artifact_id>as a globally-unique uuid4, so oneft:model_id can never carry two base_models. The rater now ENFORCES this: a rollup withCOUNT(DISTINCT base_model) FILTER (WHERE via_derived) > 1is split out of the upsert and counted as a newambiguous_baseanomaly that drives the exit-nonzero path — never silently billed at the MIN (cheaper) base. The rollup grain is unchanged (no re-key); the violation screams instead of under-billing. The falseMIN==MAXcomment is corrected to state the enforced invariant. Self-audit test added:cost == applied_rate × tokensper rollup.One-hop rule: an own-rate fine-tune is NOT a derivation base (FIX 3).
derivedRates()(the SQL projection) andResolveEvent()(the oracle) now shareisDerivationBase(), which excludes anyft:-prefixed entry orderivedFromkey from being a derivation source. Anft:event whosebase_modelpoints at another own-rate fine-tune fails loud (ErrNoPrice) — no fine-tune-of-fine-tune. SQL and oracle agree on the corrected contract (conformance integration test added).Migration-chain ownership of base_model (FIX 4). The rating migration's
downgrade()no longer dropsbilling_event.base_model— that column is owned by the billing_event create migration (b1f0c2d3e4a5). A migration reverses only its own additions; the rating downgrade leaves a schema consistent withb1f0c2d3e4a5. Verified the chain up+down against live Postgres (base_model survives the rating downgrade; only the billing_event downgrade removes it).Single-sourced
ft:prefix (FIX 4). The SQL no longer hardcodes a literal'ft:%'; the LIKE pattern is built from the GofineTunePrefixconstant and bound as$3— one source of truth for the ft: marker on the money path. Also: the deterministic rollup-id md5 input is now length-prefixed (injective, so a'|'in a field can never collide two keys), and a fail-closed e2e assertion was added for a base_model-absentft:event (ends UNPRICED, never silently rated).Tests (all proven RED before fix)
TestLoad_DerivedRateRoundsToZeroFailsClosedTestRater_FineTuneAmbiguousBaseModelFailsLoud,TestIntegration_FineTuneAmbiguousBaseModelFailsLoud,TestRater_RollupCostSelfAuditsTestResolveEvent_FineTuneCannotDeriveFromFineTune,TestIntegration_OneHopFineTuneCannotDeriveFromFineTuneTestE2E_FineTuneWithoutBaseModelHeaderIsUnpricedGate
go build,go vet(+-tags=integration),golangci-lintv1.64.8,gofmt -lempty,go test -race ./..., and the live-Postgres-tags=integrationsuite (rating conformance + e2e) all green.🤖 Generated with Claude Code