Skip to content

Rating fix pass: ratify quantize-then-multiply, sub-9dp guard, base_model plumbing (#14 battery + decisions)#15

Closed
hhuuggoo wants to merge 4 commits into
yaml-raterfrom
yaml-rater-fixes
Closed

Rating fix pass: ratify quantize-then-multiply, sub-9dp guard, base_model plumbing (#14 battery + decisions)#15
hhuuggoo wants to merge 4 commits into
yaml-raterfrom
yaml-rater-fixes

Conversation

@hhuuggoo

Copy link
Copy Markdown
Contributor

Fix pass on PR #14 (the YAML-rater rework), implementing Hugo's E1–E4 decisions and the round-1 battery findings. Stacked on yaml-rater. Money-path; every behavioral change pairs with an invariant-named test, and the oracle/conformance discipline is preserved (proven-teeth on the new sub-nano residue fixture).

Contracts (read first — the load-bearing surface)

  • New base_model event field + billing_event.base_model column. A fine-tune's HF base id rides on the metering event (E3, option a), stamped by Atlas at deploy. Atlas-side plumbing seam: atlas-auth must inject the base id on the X-Saturn-Base-Model header and add it to Traefik's authResponseHeaders allowlist (exactly as for X-Saturn-Auth-Id). Phoebe reads it defensively (absent = ""). An ft: model with an empty base_model fails loud (ErrNoPrice / UNPRICED), never silently $0.
  • Ratified rounding spec: quantize-then-multiply. The per-token rate (premium applied to the exact base rate) is quantized to 9dp before multiplying token counts, so the applied rate stored on each rated_usage row × tokens exactly reconstructs the billed cost (E1 self-auditing). This differs from sum-then-round only on a sub-nano premium residue and is a forced consequence of storing the rate on the row.
  • Sub-9dp price guard (fail closed). A per-token rate that is nonzero in the YAML but quantizes to $0 at 9dp is rejected at load (it would serve the model FREE). A literal 0 is still allowed.

The fixes (one commit each)

  1. Ratify quantize-then-multiply; recalibrate oracle + prove teeth. Rewrote doc.go's ROUNDING / PRODUCTION-vs-ORACLE sections (they still documented sum-then-round); fixed the integration oracle to feed rate.Quantized() (it fed the un-quantized rate while production bills the quantized one — a latent miscalibration, since the existing fixture had no residue); corrected Rate()'s doc. Added TestConformance_OracleQuantizesBeforeMultiply_OnResidue — a 1-nano × 1.5 = 0.0000000015 → 0.000000002 fixture that bills 6 nano over 3 tokens and proves the guard has teeth (sum-then-round gives 5 nano; a revert flips it RED).
  2. Fail-closed sub-9dp guard at price-file load (parseNonNegRate), naming the offending model/field/rate. TestLoad_SubNanoRateRoundsToZeroFailsClosed pins all arms (sub-nano rejected, half-up boundary loads, literal $0 loads, fine-tune own-rate covered).
  3. Plumb base_model so ft:<checkpoint> ids price. metering.Event.BaseModel + billing_event.base_model column (0001 SQL + billing_event Alembic create, plus idempotent ADD COLUMN IF NOT EXISTS in the rating migration / 0002); X-Saturn-Base-Model identity header carried onto the Event in proxy.emit (completion AND pre-header-abort paths); drain store INSERT extended. Rater: PriceBook.ResolveEvent + a second TEMP rating_derived table (base_model → premium-applied, quantized) with a direct-over-derived COALESCE, deriving only for an ft: id that missed the direct join and carries a base_model. Tests: TestRater_FineTunePricesViaBaseModelOnEvent, TestRater_FineTuneWithoutBaseModelFailsLoud, TestResolveEvent_FineTuneViaBaseModel, live-PG TestIntegration_FineTunePricesViaBaseModel, and end-to-end TestE2E_FineTuneBillsAtBaseTimesPremium.
  4. Mechanical findings (held back by the battery): event_count widened to BIGINT end to end (SQL cast, column in 0002/Alembic/integration schema, e2e scan); deleted dead ErrDerivationChain (one-hop is enforced at load); renamed the overclaiming TestConformance_SQLModelMatchesRateOracleTestOracleModel_SelfConsistent (it runs no SQL); fixed TestLoad_MalformedYAMLFailsClosed's doc to enumerate the inconsistent-premium cases it tests; corrected the stale "pointer-not-copy" comment; fixed RateResult.TotalCost doc ("0", not "").

Gate

go build ./..., gofmt -l . (empty), go vet ./... and go vet -tags=integration ./..., go test -race ./..., golangci-lint v1.64.8 (default + integration tags) — all clean. Live-Postgres integration and the e2e pipeline test run green with -race -tags=integration.

Should land on #14's lineage. Once merged into yaml-rater, re-run the battery to dry.

hhuuggoo and others added 4 commits June 15, 2026 18:58
…le + prove teeth

E1 stores the applied per-token rate on each rated_usage row, which REQUIRES the
billing rate be a 9dp NUMERIC the row can hold — so rating quantizes the per-token
rate (premium applied to the exact base rate first) to 9dp, then multiplies by token
counts. Hugo ratified this quantize-then-multiply model as the spec; it differs from
the old sum-then-round only on a sub-nano premium residue, and is what self-auditing
rows demand (the stored 9dp rate x tokens must reconstruct the cost).

The code already quantized (the SQL price table is NUMERIC(20,9); the in-Go oracle
fed .Quantized()), but two surfaces lied or were mis-calibrated:

- doc.go's ROUNDING + PRODUCTION-vs-ORACLE sections still documented sum-then-round.
  Rewrite to document quantize-then-multiply, why E1 forces it, and that it is a
  deliberate ratified choice.
- store_integration_test.go's TestIntegration_RateWindow_ConformsToOracle fed the
  oracle the UN-quantized resolved rate while production bills the quantized one — a
  latent miscalibration (the existing fixture has no residue, so it never diverged).
  Feed rate.Quantized() so the oracle mirrors production.
- oracle_test.go's Rate() doc claimed sum-then-round; correct it (Rate is faithful
  only when fed a quantized rate, as all conformance callers now do).

Add TestConformance_OracleQuantizesBeforeMultiply_OnResidue: a 1-nano base x 1.5
fine-tune (0.0000000015 -> 0.000000002) rated through the REAL SQL, asserting the SQL
agrees with the quantized oracle (6 nano over 3 tokens) AND proving the guard has
teeth — the old sum-then-round value (5 nano) demonstrably differs, so a revert to the
un-quantized oracle flips the test RED. Closes the latent miscalibration permanently.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A per-token rate finer than 9dp (nano-USD, the NUMERIC(20,9) money scale) is silently
rounded at projection. One that rounds to zero — e.g. "0.0000000001" -> 0.000000000 —
would serve the model for FREE, the precise silent-lost-revenue outcome this package
exists to prevent. An operator who writes a nonzero number intends a nonzero price, so
a round-to-zero is a MIS-PRICED model, not a free one.

Guard at price-file LOAD time (parseNonNegRate, the per-rate validator): reject a rate
that is nonzero in the file but quantizes to $0 at 9dp, fail-closed with an error
naming the offending model + field + rate. A literal "0" (an intentional free rate) is
still allowed — the guard targets only "nonzero number we'd round to zero". Covers base
rates AND fine-tune own-rates (same parseRate3 path).

Test TestLoad_SubNanoRateRoundsToZeroFailsClosed pins all four arms: sub-nano nonzero
rejected (naming the model), half-up boundary 0.0000000005 -> 1 nano still loads,
literal $0 still loads, fine-tune own-rate sub-nano rejected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ing event (E3)

A fine-tune's price key is ft:<checkpoint> (E3) — a per-deployment checkpoint id the
price file never names. To price it at base x premium the rater needs the base, which
the file can't hold. Hugo's decision (option a): stamp the base model onto the metering
event at deploy time, where Atlas has already validated it present (a fine-tune cannot
deploy without a base — a hard precondition). Phoebe now CARRIES and USES it.

Plumbing (additive, hot-path-safe; empty base_model is valid for a base model):
- metering.Event gains BaseModel; billing_event gains a base_model column (0001 SQL +
  the billing_event Alembic create, plus an idempotent ADD COLUMN IF NOT EXISTS in the
  rating migration / 0002 so an already-applied billing_event picks it up).
- identity: new X-Saturn-Base-Model header (the Atlas-side injection is the documented
  seam), read defensively (absent = ""), carried on Identity and stamped onto the
  Event in proxy.emit — for BOTH the completion and pre-header-abort paths.
- drain store: base_model added to the INSERT column list + eventArgs (nullStr).

Rater (the money path):
- PriceBook.ResolveEvent(modelID, baseModel): direct model_id price wins; else an ft:
  id with a priced base_model resolves to base x premium (one hop); else ErrNoPrice.
- store.go projects a second TEMP table rating_derived (base_model -> premium-applied,
  9dp-quantized rate) and the SQL COALESCEs direct-over-derived, deriving ONLY for an
  ft: model_id that missed the direct join and carries a base_model.

FAIL-CLOSED INVARIANT: an ft: model_id with an EMPTY base_model is a base_model
propagation bug, NOT a free model — it resolves to ErrNoPrice, is counted UNPRICED, and
screams (exit-nonzero), never silently $0-billed. Pinned by name in
TestRater_FineTuneWithoutBaseModelFailsLoud and the SQL TestIntegration_FineTunePricesViaBaseModel.

Tests: TestRater_FineTunePricesViaBaseModelOnEvent, TestResolveEvent_FineTuneViaBaseModel,
TestRater_FineTuneWithoutBaseModelFailsLoud, the live-PG TestIntegration_FineTunePricesViaBaseModel,
and the end-to-end TestE2E_FineTuneBillsAtBaseTimesPremium (ft: request carrying the
base_model header bills at base x 1.5 through the whole pipe). doc.go/pricebook.go lose
the "flagged/unlinked gap" non-goal — ft: pricing now works via the event's base_model.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Grouped mechanical cleanups, no behavioral money change beyond the event_count width:

- event_count widened to BIGINT end to end (the earlier fast-follow's widening didn't
  carry into the YAML-rater rewrite): the SQL COUNT(*) cast (::int -> ::bigint), the
  rated_usage column (0002 SQL + rating Alembic + the integration schema), and the e2e
  scan (int -> int64). An INTEGER column silently caps a hot (auth, model, hour) bucket
  at 2^31 while SUM(event_count) is already ::bigint.
- Deleted the dead ErrDerivationChain sentinel: it was documented as returned by the
  oracle but never was. One-hop derivation is enforced at LOAD (buildPriceBook rejects
  multi-hop / dangling derived_from) and in ResolveEvent, so a deeper chain can never
  reach the oracle — there is nothing to return. Replaced with a comment stating where
  the invariant actually lives.
- Renamed the overclaiming TestConformance_SQLModelMatchesRateOracle ->
  TestOracleModel_SelfConsistent: it runs NO SQL, it pins the in-Go oracle's
  self-consistency; the REAL SQL conformance is the integration test (cross-reference
  fixed).
- TestLoad_MalformedYAMLFailsClosed: the doc now accurately enumerates every shape it
  pins, INCLUDING the inconsistent-premium-policy cases (multiplier-no-factor,
  markup-with-factor, unknown-policy) it already tested but didn't mention.
- Corrected the stale "pointer-not-copy rule" comment on TestLoad_FineTuneIdentityPremium
  (it tests identity-default = base exactly, not the propagation rule).
- Fixed RateResult.TotalCost doc: the SQL COALESCEs SUM to 0, so an empty window returns
  "0", never "" — doc now matches reality.
- migrations/README: the fine-tune base-linkage "gap (flagged)" is now "closed" (carried
  on billing_event.base_model).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@hhuuggoo

Copy link
Copy Markdown
Contributor Author

🔋 Battery review — Round 1 (status: ESCALATE)

Tier-2 review of the fine-tune-pricing fix pass. 19 raw → 5 refuted → 14 confirmed + 5 persona. The base_model plumbing works, but the derived-rate path has correctness gaps the round-1 fixes didn't cover. Not merge-ready.

🔴 High-sev mechanical (will fix) — the round-to-zero guard misses the derived path

The sub-9dp guard added last pass only protects file-declared rates. A fine-tune premium that drives the derived rate to $0 (factor "0", a tiny fractional factor, a negative-rounding markup) is not rejected → every ft: event bills $0, silently, counted as rated not unpriced. Same bug class as the guard we just added, on the path it didn't cover. Battery proved it empirically. Fix: apply the round-to-zero fail-closed guard to the derived rate at load time too. (Mechanical — I'll fix it.)

🛑 Decisions for Hugo

1. Same ft: id, different bases → one rollup or two? (the rollup-grain one-way door.) rated_usage's key is (auth_id, model_id, window_start) — it omits base_model. So two fine-tune deployments reporting the same ft: model_id but derived from different bases merge into one rollup at a single rate (and the applied-rate columns take MIN, so the cheaper base wins — under-billing). In practice ft:<checkpoint_artifact_id> is globally unique, so this likely can't happen — but it depends on that uniqueness being guaranteed. Decide: is ft: id globally unique (then this is impossible — add a comment + a uniqueness assertion and move on), or can two deployments share one (then base_model must join the rollup key + migration)? My read: checkpoint artifact ids are uuid4, so unique — but you own confirming the invariant.

2. Own-rate fine-tune as a derivation base → multi-hop pricing (the one-hop contract). A fine-tune with its own explicit rate is currently projected as a possible base for another fine-tune's premium → a fine-tune deriving from a fine-tune, which the one-hop rule was meant to forbid. The Go oracle mirrors this, so the conformance test can't catch it. Decide the intended one-hop contract: exclude own-rate fine-tunes from being derivation bases (my lean — matches "one hop only"), or accept fine-tune-of-fine-tune pricing. Then the filter + oracle get fixed together.

🟡 Smaller design/contract (I'll fix once #1#2 land)

  • Migration-chain ownership: the rating migration's downgrade() drops billing_event.base_model, but that column is now owned by the billing_event create migration — a partial downgrade leaves a diverged schema. Fix: rating's downgrade shouldn't drop a column it doesn't own.
  • Prefix encoded twice: SQL hardcodes 'ft:%' while Go reads the fineTunePrefix constant — two sources of truth for a money-path contract. Single-source it.
  • The base_model-absent fail-loud path (Capture X-Saturn-Auth-Id (token identity) into metering events #1 from escalations) — operational blast radius of the cross-repo header seam: if Traefik doesn't allowlist X-Saturn-Base-Model, all fine-tune traffic goes UNPRICED (fail-loud, not silent — correct, but it means fine-tune billing is dark until the header is wired). Worth knowing for the Atlas-side rollout.
  • Plus: the deterministic rollup-id md5 separator collision, several overclaiming doc comments asserting invariants that don't hold, a missing e2e fail-closed assertion.

Battery wf_52b37bcb-f37. #15 needs the derived-rate guard (high-sev) + decisions 1–2, then a fix pass + re-battery. The base_model mechanism is sound; the derived-pricing edges are what need tightening.

@hhuuggoo

Copy link
Copy Markdown
Contributor Author

Landed on main via the squashed rating merge (3b22908). Closing the stack.

@hhuuggoo hhuuggoo closed this Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant