Rating fix pass: ratify quantize-then-multiply, sub-9dp guard, base_model plumbing (#14 battery + decisions) by hhuuggoo · Pull Request #15 · saturncloud/phoebe

hhuuggoo · 2026-06-15T19:12:38Z

Fix pass on PR #14 (the YAML-rater rework), implementing Hugo's E1–E4 decisions and the round-1 battery findings. Stacked on yaml-rater. Money-path; every behavioral change pairs with an invariant-named test, and the oracle/conformance discipline is preserved (proven-teeth on the new sub-nano residue fixture).

Contracts (read first — the load-bearing surface)

New base_model event field + billing_event.base_model column. A fine-tune's HF base id rides on the metering event (E3, option a), stamped by Atlas at deploy. Atlas-side plumbing seam: atlas-auth must inject the base id on the X-Saturn-Base-Model header and add it to Traefik's authResponseHeaders allowlist (exactly as for X-Saturn-Auth-Id). Phoebe reads it defensively (absent = ""). An ft: model with an empty base_model fails loud (ErrNoPrice / UNPRICED), never silently $0.
Ratified rounding spec: quantize-then-multiply. The per-token rate (premium applied to the exact base rate) is quantized to 9dp before multiplying token counts, so the applied rate stored on each rated_usage row × tokens exactly reconstructs the billed cost (E1 self-auditing). This differs from sum-then-round only on a sub-nano premium residue and is a forced consequence of storing the rate on the row.
Sub-9dp price guard (fail closed). A per-token rate that is nonzero in the YAML but quantizes to $0 at 9dp is rejected at load (it would serve the model FREE). A literal 0 is still allowed.

The fixes (one commit each)

Ratify quantize-then-multiply; recalibrate oracle + prove teeth. Rewrote doc.go's ROUNDING / PRODUCTION-vs-ORACLE sections (they still documented sum-then-round); fixed the integration oracle to feed rate.Quantized() (it fed the un-quantized rate while production bills the quantized one — a latent miscalibration, since the existing fixture had no residue); corrected Rate()'s doc. Added TestConformance_OracleQuantizesBeforeMultiply_OnResidue — a 1-nano × 1.5 = 0.0000000015 → 0.000000002 fixture that bills 6 nano over 3 tokens and proves the guard has teeth (sum-then-round gives 5 nano; a revert flips it RED).
Fail-closed sub-9dp guard at price-file load (parseNonNegRate), naming the offending model/field/rate. TestLoad_SubNanoRateRoundsToZeroFailsClosed pins all arms (sub-nano rejected, half-up boundary loads, literal $0 loads, fine-tune own-rate covered).
Plumb base_model so ft:<checkpoint> ids price. metering.Event.BaseModel + billing_event.base_model column (0001 SQL + billing_event Alembic create, plus idempotent ADD COLUMN IF NOT EXISTS in the rating migration / 0002); X-Saturn-Base-Model identity header carried onto the Event in proxy.emit (completion AND pre-header-abort paths); drain store INSERT extended. Rater: PriceBook.ResolveEvent + a second TEMP rating_derived table (base_model → premium-applied, quantized) with a direct-over-derived COALESCE, deriving only for an ft: id that missed the direct join and carries a base_model. Tests: TestRater_FineTunePricesViaBaseModelOnEvent, TestRater_FineTuneWithoutBaseModelFailsLoud, TestResolveEvent_FineTuneViaBaseModel, live-PG TestIntegration_FineTunePricesViaBaseModel, and end-to-end TestE2E_FineTuneBillsAtBaseTimesPremium.
Mechanical findings (held back by the battery): event_count widened to BIGINT end to end (SQL cast, column in 0002/Alembic/integration schema, e2e scan); deleted dead ErrDerivationChain (one-hop is enforced at load); renamed the overclaiming TestConformance_SQLModelMatchesRateOracle → TestOracleModel_SelfConsistent (it runs no SQL); fixed TestLoad_MalformedYAMLFailsClosed's doc to enumerate the inconsistent-premium cases it tests; corrected the stale "pointer-not-copy" comment; fixed RateResult.TotalCost doc ("0", not "").

Gate

go build ./..., gofmt -l . (empty), go vet ./... and go vet -tags=integration ./..., go test -race ./..., golangci-lint v1.64.8 (default + integration tags) — all clean. Live-Postgres integration and the e2e pipeline test run green with -race -tags=integration.

Should land on #14's lineage. Once merged into yaml-rater, re-run the battery to dry.

…le + prove teeth E1 stores the applied per-token rate on each rated_usage row, which REQUIRES the billing rate be a 9dp NUMERIC the row can hold — so rating quantizes the per-token rate (premium applied to the exact base rate first) to 9dp, then multiplies by token counts. Hugo ratified this quantize-then-multiply model as the spec; it differs from the old sum-then-round only on a sub-nano premium residue, and is what self-auditing rows demand (the stored 9dp rate x tokens must reconstruct the cost). The code already quantized (the SQL price table is NUMERIC(20,9); the in-Go oracle fed .Quantized()), but two surfaces lied or were mis-calibrated: - doc.go's ROUNDING + PRODUCTION-vs-ORACLE sections still documented sum-then-round. Rewrite to document quantize-then-multiply, why E1 forces it, and that it is a deliberate ratified choice. - store_integration_test.go's TestIntegration_RateWindow_ConformsToOracle fed the oracle the UN-quantized resolved rate while production bills the quantized one — a latent miscalibration (the existing fixture has no residue, so it never diverged). Feed rate.Quantized() so the oracle mirrors production. - oracle_test.go's Rate() doc claimed sum-then-round; correct it (Rate is faithful only when fed a quantized rate, as all conformance callers now do). Add TestConformance_OracleQuantizesBeforeMultiply_OnResidue: a 1-nano base x 1.5 fine-tune (0.0000000015 -> 0.000000002) rated through the REAL SQL, asserting the SQL agrees with the quantized oracle (6 nano over 3 tokens) AND proving the guard has teeth — the old sum-then-round value (5 nano) demonstrably differs, so a revert to the un-quantized oracle flips the test RED. Closes the latent miscalibration permanently. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A per-token rate finer than 9dp (nano-USD, the NUMERIC(20,9) money scale) is silently rounded at projection. One that rounds to zero — e.g. "0.0000000001" -> 0.000000000 — would serve the model for FREE, the precise silent-lost-revenue outcome this package exists to prevent. An operator who writes a nonzero number intends a nonzero price, so a round-to-zero is a MIS-PRICED model, not a free one. Guard at price-file LOAD time (parseNonNegRate, the per-rate validator): reject a rate that is nonzero in the file but quantizes to $0 at 9dp, fail-closed with an error naming the offending model + field + rate. A literal "0" (an intentional free rate) is still allowed — the guard targets only "nonzero number we'd round to zero". Covers base rates AND fine-tune own-rates (same parseRate3 path). Test TestLoad_SubNanoRateRoundsToZeroFailsClosed pins all four arms: sub-nano nonzero rejected (naming the model), half-up boundary 0.0000000005 -> 1 nano still loads, literal $0 still loads, fine-tune own-rate sub-nano rejected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ing event (E3) A fine-tune's price key is ft:<checkpoint> (E3) — a per-deployment checkpoint id the price file never names. To price it at base x premium the rater needs the base, which the file can't hold. Hugo's decision (option a): stamp the base model onto the metering event at deploy time, where Atlas has already validated it present (a fine-tune cannot deploy without a base — a hard precondition). Phoebe now CARRIES and USES it. Plumbing (additive, hot-path-safe; empty base_model is valid for a base model): - metering.Event gains BaseModel; billing_event gains a base_model column (0001 SQL + the billing_event Alembic create, plus an idempotent ADD COLUMN IF NOT EXISTS in the rating migration / 0002 so an already-applied billing_event picks it up). - identity: new X-Saturn-Base-Model header (the Atlas-side injection is the documented seam), read defensively (absent = ""), carried on Identity and stamped onto the Event in proxy.emit — for BOTH the completion and pre-header-abort paths. - drain store: base_model added to the INSERT column list + eventArgs (nullStr). Rater (the money path): - PriceBook.ResolveEvent(modelID, baseModel): direct model_id price wins; else an ft: id with a priced base_model resolves to base x premium (one hop); else ErrNoPrice. - store.go projects a second TEMP table rating_derived (base_model -> premium-applied, 9dp-quantized rate) and the SQL COALESCEs direct-over-derived, deriving ONLY for an ft: model_id that missed the direct join and carries a base_model. FAIL-CLOSED INVARIANT: an ft: model_id with an EMPTY base_model is a base_model propagation bug, NOT a free model — it resolves to ErrNoPrice, is counted UNPRICED, and screams (exit-nonzero), never silently $0-billed. Pinned by name in TestRater_FineTuneWithoutBaseModelFailsLoud and the SQL TestIntegration_FineTunePricesViaBaseModel. Tests: TestRater_FineTunePricesViaBaseModelOnEvent, TestResolveEvent_FineTuneViaBaseModel, TestRater_FineTuneWithoutBaseModelFailsLoud, the live-PG TestIntegration_FineTunePricesViaBaseModel, and the end-to-end TestE2E_FineTuneBillsAtBaseTimesPremium (ft: request carrying the base_model header bills at base x 1.5 through the whole pipe). doc.go/pricebook.go lose the "flagged/unlinked gap" non-goal — ft: pricing now works via the event's base_model. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Grouped mechanical cleanups, no behavioral money change beyond the event_count width: - event_count widened to BIGINT end to end (the earlier fast-follow's widening didn't carry into the YAML-rater rewrite): the SQL COUNT(*) cast (::int -> ::bigint), the rated_usage column (0002 SQL + rating Alembic + the integration schema), and the e2e scan (int -> int64). An INTEGER column silently caps a hot (auth, model, hour) bucket at 2^31 while SUM(event_count) is already ::bigint. - Deleted the dead ErrDerivationChain sentinel: it was documented as returned by the oracle but never was. One-hop derivation is enforced at LOAD (buildPriceBook rejects multi-hop / dangling derived_from) and in ResolveEvent, so a deeper chain can never reach the oracle — there is nothing to return. Replaced with a comment stating where the invariant actually lives. - Renamed the overclaiming TestConformance_SQLModelMatchesRateOracle -> TestOracleModel_SelfConsistent: it runs NO SQL, it pins the in-Go oracle's self-consistency; the REAL SQL conformance is the integration test (cross-reference fixed). - TestLoad_MalformedYAMLFailsClosed: the doc now accurately enumerates every shape it pins, INCLUDING the inconsistent-premium-policy cases (multiplier-no-factor, markup-with-factor, unknown-policy) it already tested but didn't mention. - Corrected the stale "pointer-not-copy rule" comment on TestLoad_FineTuneIdentityPremium (it tests identity-default = base exactly, not the propagation rule). - Fixed RateResult.TotalCost doc: the SQL COALESCEs SUM to 0, so an empty window returns "0", never "" — doc now matches reality. - migrations/README: the fine-tune base-linkage "gap (flagged)" is now "closed" (carried on billing_event.base_model). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

hhuuggoo · 2026-06-15T19:23:33Z

🔋 Battery review — Round 1 (status: ESCALATE)

Tier-2 review of the fine-tune-pricing fix pass. 19 raw → 5 refuted → 14 confirmed + 5 persona. The base_model plumbing works, but the derived-rate path has correctness gaps the round-1 fixes didn't cover. Not merge-ready.

🔴 High-sev mechanical (will fix) — the round-to-zero guard misses the derived path

The sub-9dp guard added last pass only protects file-declared rates. A fine-tune premium that drives the derived rate to $0 (factor "0", a tiny fractional factor, a negative-rounding markup) is not rejected → every ft: event bills $0, silently, counted as rated not unpriced. Same bug class as the guard we just added, on the path it didn't cover. Battery proved it empirically. Fix: apply the round-to-zero fail-closed guard to the derived rate at load time too. (Mechanical — I'll fix it.)

🛑 Decisions for Hugo

1. Same ft: id, different bases → one rollup or two? (the rollup-grain one-way door.) rated_usage's key is (auth_id, model_id, window_start) — it omits base_model. So two fine-tune deployments reporting the same ft: model_id but derived from different bases merge into one rollup at a single rate (and the applied-rate columns take MIN, so the cheaper base wins — under-billing). In practice ft:<checkpoint_artifact_id> is globally unique, so this likely can't happen — but it depends on that uniqueness being guaranteed. Decide: is ft: id globally unique (then this is impossible — add a comment + a uniqueness assertion and move on), or can two deployments share one (then base_model must join the rollup key + migration)? My read: checkpoint artifact ids are uuid4, so unique — but you own confirming the invariant.

2. Own-rate fine-tune as a derivation base → multi-hop pricing (the one-hop contract). A fine-tune with its own explicit rate is currently projected as a possible base for another fine-tune's premium → a fine-tune deriving from a fine-tune, which the one-hop rule was meant to forbid. The Go oracle mirrors this, so the conformance test can't catch it. Decide the intended one-hop contract: exclude own-rate fine-tunes from being derivation bases (my lean — matches "one hop only"), or accept fine-tune-of-fine-tune pricing. Then the filter + oracle get fixed together.

🟡 Smaller design/contract (I'll fix once #1–#2 land)

Migration-chain ownership: the rating migration's downgrade() drops billing_event.base_model, but that column is now owned by the billing_event create migration — a partial downgrade leaves a diverged schema. Fix: rating's downgrade shouldn't drop a column it doesn't own.
Prefix encoded twice: SQL hardcodes 'ft:%' while Go reads the fineTunePrefix constant — two sources of truth for a money-path contract. Single-source it.
The base_model-absent fail-loud path (Capture X-Saturn-Auth-Id (token identity) into metering events #1 from escalations) — operational blast radius of the cross-repo header seam: if Traefik doesn't allowlist X-Saturn-Base-Model, all fine-tune traffic goes UNPRICED (fail-loud, not silent — correct, but it means fine-tune billing is dark until the header is wired). Worth knowing for the Atlas-side rollout.
Plus: the deterministic rollup-id md5 separator collision, several overclaiming doc comments asserting invariants that don't hold, a missing e2e fail-closed assertion.

Battery wf_52b37bcb-f37. #15 needs the derived-rate guard (high-sev) + decisions 1–2, then a fix pass + re-battery. The base_model mechanism is sound; the derived-pricing edges are what need tightening.

hhuuggoo · 2026-06-16T01:03:04Z

Landed on main via the squashed rating merge (3b22908). Closing the stack.

hhuuggoo and others added 4 commits June 15, 2026 18:58

This was referenced Jun 15, 2026

Phoebe: enforce E3 one-hop + ft-uniqueness, derived round-to-zero guard #16

Closed

Phoebe: D1 truncation logging + D2 event_count BIGINT (Hugo's decisions) #13

Closed

hhuuggoo closed this Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rating fix pass: ratify quantize-then-multiply, sub-9dp guard, base_model plumbing (#14 battery + decisions)#15

Rating fix pass: ratify quantize-then-multiply, sub-9dp guard, base_model plumbing (#14 battery + decisions)#15
hhuuggoo wants to merge 4 commits into
yaml-raterfrom
yaml-rater-fixes

hhuuggoo commented Jun 15, 2026

Uh oh!

hhuuggoo commented Jun 15, 2026

Uh oh!

hhuuggoo commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hhuuggoo commented Jun 15, 2026

Contracts (read first — the load-bearing surface)

The fixes (one commit each)

Gate

Uh oh!

hhuuggoo commented Jun 15, 2026

🔋 Battery review — Round 1 (status: ESCALATE)

🔴 High-sev mechanical (will fix) — the round-to-zero guard misses the derived path

🛑 Decisions for Hugo

🟡 Smaller design/contract (I'll fix once #1–#2 land)

Uh oh!

hhuuggoo commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant