Phoebe: rate from YAML price config (E1)#14
Conversation
Move prices off DB tables (model_price + derivation_policy) onto a versioned operator-authored YAML price file. The rater loads the file at run start, projects the resolved per-token rates (fine-tune premium applied in exact decimal) into a transient TEMP table, rates the last complete hour entirely in SQL, and FREEZES the applied per-token rates onto each rated_usage row so the row is self-auditing and immutable. Contracts: - New operator-facing price file (config/prices.example.yaml): base per-token rates keyed on the HF model id, the single global fine-tune premium policy (identity|multiplier|markup), and per-GPU floor rates. Rates are exact-decimal strings (never float). Loader fails closed on anything malformed. - rated_usage gains applied_prompt_rate / applied_cached_rate / applied_completion_rate NUMERIC(20,9): the exact rate each rollup billed at. - Dropped model_price + derivation_policy (and the GiST exclusion constraints, effective-dating, and the SQL seed). Clean rewrite of the 0002 migration + Alembic (no prod data; the Alembic was never applied to saturn). Keeps the money discipline: NUMERIC throughout, cost computed+summed in SQL, cached-subset billable-prompt formula, fail-loud-never-$0 (ErrNoPrice / unpriced count), idempotent deterministic-id upsert, session-TZ-independent bucketing, and the Rate() oracle + live-Postgres conformance pinning the SQL. Fine-tune base linkage is a flagged gap: billing_event carries only the engine model NAME, so an ft:<checkpoint> id prices only if the file declares its derived_from (or own rate); otherwise it is unpriced (fail loud). Base-direct models price fully. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
🔋 Battery review — Round 1 (status: ESCALATE)Tier-2 money-path review of the YAML-rater rework. 19 raw → 4 refuted → 15 confirmed + 7 persona. Stopped on design/money decisions that must not be auto-patched. Two need Hugo; two are confirm-the-non-goal; the rest are mechanical (held back). 🛑 Money decisions for Hugo1. Rounding model changed — quantize-then-multiply vs sum-then-round (this PR flipped it). Merged 2. Effective-dated pricing removed — confirm the late-arrival semantics. The price table + GiST + effective-dating are gone (intended — prices are YAML now). "Never reprice served traffic" now holds because the row freezes its applied rate, not because price is resolved as-of-event-time. Consequence: a late-arriving event in an already-rated hour, re-rated after a YAML price change, bills at the new rate, not the rate when it was served. Given the rater runs hourly and prices change rarely, the window is tiny — but confirm this is acceptable (it's the same call you reasoned through when choosing rate-on-row). 🛑 Confirm-the-non-goal (self-flagged, not bugs)
✅ Mechanical (held back — will fix once decisions land)
Battery |
|
Landed on main via the squashed rating merge (3b22908). Closing the stack. |
Reworks phoebe's rater to price from a versioned, operator-authored YAML price file instead of the DB price tables, per the locked E1–E4 decisions (
token-factory-rating-atlas-decisions.md). The rater loads the file at run start, projects the resolved per-token rates into a transient TEMP table, rates the last complete hour entirely in SQL, and freezes the applied rate onto eachrated_usagerow so the row is self-auditing and immutable.Money discipline is unchanged: NUMERIC throughout, cost computed + summed in SQL, the cached-subset billable-prompt formula, fail-loud-never-$0, idempotent deterministic-id upsert, session-TZ-independent bucketing, and the
Rate()oracle + live-Postgres conformance pinning the SQL.Contracts
1. The price file — the new operator-facing contract (
config/prices.example.yaml)Prices are a versioned YAML file the operator authors and git-tracks; the file's history IS the price audit trail (no price table, no effective-dating, no price-management UI). Rates are exact-decimal strings, never float. The loader fails closed on anything malformed (missing file, bad YAML, unknown version, float-shaped/negative rate, missing component, inconsistent premium, dangling
derived_from).Resolution mirrors the SQL exactly: own rate wins (base, or an ft with its own rate); else an ft's
derived_frombase × the global premium (one hop); elseErrNoPrice(never $0). The premium is applied to the exact base rate, then the final per-token rate is quantized to 9dp — the rate that bills and the rate stored on the row are bit-identical, so cost is always reconstructable from the row.2.
rated_usage— applied-rate columns addedThe exact per-token rates each rollup was billed at, frozen onto the row from the file the run loaded. The row is then immutable and self-auditing — "we never reprice traffic you've already served" holds by construction; re-rating is a deliberate, audited re-run.
3. Dropped:
model_price+derivation_policyThe whole temporal price-book apparatus is gone — both tables, the
btree_gistGiST exclusion constraints, effective-dating, and the SQL price seed (seed_example_prices.sql). Prices are config now.Migration approach: clean rewrite of
0002_rating.sql+atlas/c2f1a3b4d5e6_add_rating.py(create onlyrated_usagewith the applied-rate columns). Justified because there is no prod data and the Alembic file was never copied intosaturn/alembic(it's a ready-to-copy artifact maintained here). The Alembic docstring flags that if anyone has applied amodel_price/derivation_policyversion of this revision, they must add a follow-up drop+alter instead.Flagged gap — fine-tune base linkage
billing_eventcarries only the engine-reported model name (noderived_from/base_modelcolumn). So a fine-tune's base is not plumbed to the rater. Base-direct models price fully today. Anft:<checkpoint>id prices only if the file declares itsderived_from(or own rate); otherwise it is unpriced — fail loud, never $0 (tested). Closing the gap means the metering path stamping the base (saturn.io/...base_model) onto the event, or shipping a fine-tune→base map in the file. The premium machinery is complete and tested; only the linkage source is pending.S3 seam (out of scope, left clean)
The price file loads from a local path (
-pricesflag /priceFilesetting).LoadPriceBook(localPath)is the seam: fetch-from-S3-to-local then load. The create-time price gate (E4) and the rater must read the same file/version — a single fetched copy is the shared artifact.Tests (all named; full gate + live-PG green)
yaml-base-price-applied,yaml-fine-tune-premium-multiplier,yaml-fine-tune-premium-markup, fine-tune-identity/own-rate-bypass,missing-price-fails-loud-not-zero,applied-rate-stored-on-row,cached-subset-not-double-count,numeric-exactness-no-float,idempotent-rerun,malformed-yaml-fails-closed(16 sub-cases), nil-book-fails-closed, gpu-floor parse, example-file-valid. Integration (live PG):RateWindow_ConformsToOracle(asserts the applied-rate columns),PremiumQuantizedBeforeBilling(self-audit: stored 9dp rate reconstructs cost), UTC-bucketing, index-serves-scan, and the e2e pipeline test (price source swapped to a YAML book).Gate:
go build,go test -race ./...,go vet(+-tags=integration), golangci-lint v1.64.8,gofmt -l— all clean. Live-Postgres-tags=integration(incl.-race) — all green.🤖 Generated with Claude Code