Skip to content

Phoebe: rate from YAML price config (E1)#14

Closed
hhuuggoo wants to merge 1 commit into
mainfrom
yaml-rater
Closed

Phoebe: rate from YAML price config (E1)#14
hhuuggoo wants to merge 1 commit into
mainfrom
yaml-rater

Conversation

@hhuuggoo

Copy link
Copy Markdown
Contributor

Reworks phoebe's rater to price from a versioned, operator-authored YAML price file instead of the DB price tables, per the locked E1–E4 decisions (token-factory-rating-atlas-decisions.md). The rater loads the file at run start, projects the resolved per-token rates into a transient TEMP table, rates the last complete hour entirely in SQL, and freezes the applied rate onto each rated_usage row so the row is self-auditing and immutable.

Money discipline is unchanged: NUMERIC throughout, cost computed + summed in SQL, the cached-subset billable-prompt formula, fail-loud-never-$0, idempotent deterministic-id upsert, session-TZ-independent bucketing, and the Rate() oracle + live-Postgres conformance pinning the SQL.

Contracts

1. The price file — the new operator-facing contract (config/prices.example.yaml)

Prices are a versioned YAML file the operator authors and git-tracks; the file's history IS the price audit trail (no price table, no effective-dating, no price-management UI). Rates are exact-decimal strings, never float. The loader fails closed on anything malformed (missing file, bad YAML, unknown version, float-shaped/negative rate, missing component, inconsistent premium, dangling derived_from).

version: 1

base_models:                                  # keyed on the HF model id (E3)
  "meta-llama/Llama-3.1-8B-Instruct":
    prompt:     "0.000000200"                 # per-token USD, exact decimal string
    cached:     "0.000000050"                 # distinct, discounted; cached ⊆ prompt
    completion: "0.000000600"

fine_tune_premium:                            # the SINGLE global premium policy
  policy: multiplier                          # identity | multiplier | markup
  factor: "1.5"                               # set iff multiplier
  # markup: "0.000000100"                     # set iff markup (per-token USD)

fine_tunes:                                   # OPTIONAL ft:<checkpoint> entries (E3)
  "ft:1f0c2d3e4a5b6c7d8e9f0a1b2c3d4e5f":
    derived_from: "meta-llama/Llama-3.1-8B-Instruct"   # base × premium (one hop)
  # an ft may instead carry its own `rate:` (escape hatch; bypasses the premium)

gpu_floor_rates:                              # per-GPU floor (uptime meter later;
  "A100-80GB": "0.000000000"                  # PARSED + VALIDATED now, not yet wired)
  "H100-80GB": "0.000000000"

Resolution mirrors the SQL exactly: own rate wins (base, or an ft with its own rate); else an ft's derived_from base × the global premium (one hop); else ErrNoPrice (never $0). The premium is applied to the exact base rate, then the final per-token rate is quantized to 9dp — the rate that bills and the rate stored on the row are bit-identical, so cost is always reconstructable from the row.

2. rated_usage — applied-rate columns added

applied_prompt_rate     NUMERIC(20,9) NOT NULL DEFAULT 0,
applied_cached_rate     NUMERIC(20,9) NOT NULL DEFAULT 0,
applied_completion_rate NUMERIC(20,9) NOT NULL DEFAULT 0,

The exact per-token rates each rollup was billed at, frozen onto the row from the file the run loaded. The row is then immutable and self-auditing — "we never reprice traffic you've already served" holds by construction; re-rating is a deliberate, audited re-run.

3. Dropped: model_price + derivation_policy

The whole temporal price-book apparatus is gone — both tables, the btree_gist GiST exclusion constraints, effective-dating, and the SQL price seed (seed_example_prices.sql). Prices are config now.

Migration approach: clean rewrite of 0002_rating.sql + atlas/c2f1a3b4d5e6_add_rating.py (create only rated_usage with the applied-rate columns). Justified because there is no prod data and the Alembic file was never copied into saturn/alembic (it's a ready-to-copy artifact maintained here). The Alembic docstring flags that if anyone has applied a model_price/derivation_policy version of this revision, they must add a follow-up drop+alter instead.

Flagged gap — fine-tune base linkage

billing_event carries only the engine-reported model name (no derived_from/base_model column). So a fine-tune's base is not plumbed to the rater. Base-direct models price fully today. An ft:<checkpoint> id prices only if the file declares its derived_from (or own rate); otherwise it is unpriced — fail loud, never $0 (tested). Closing the gap means the metering path stamping the base (saturn.io/...base_model) onto the event, or shipping a fine-tune→base map in the file. The premium machinery is complete and tested; only the linkage source is pending.

S3 seam (out of scope, left clean)

The price file loads from a local path (-prices flag / priceFile setting). LoadPriceBook(localPath) is the seam: fetch-from-S3-to-local then load. The create-time price gate (E4) and the rater must read the same file/version — a single fetched copy is the shared artifact.

Tests (all named; full gate + live-PG green)

yaml-base-price-applied, yaml-fine-tune-premium-multiplier, yaml-fine-tune-premium-markup, fine-tune-identity/own-rate-bypass, missing-price-fails-loud-not-zero, applied-rate-stored-on-row, cached-subset-not-double-count, numeric-exactness-no-float, idempotent-rerun, malformed-yaml-fails-closed (16 sub-cases), nil-book-fails-closed, gpu-floor parse, example-file-valid. Integration (live PG): RateWindow_ConformsToOracle (asserts the applied-rate columns), PremiumQuantizedBeforeBilling (self-audit: stored 9dp rate reconstructs cost), UTC-bucketing, index-serves-scan, and the e2e pipeline test (price source swapped to a YAML book).

Gate: go build, go test -race ./..., go vet (+ -tags=integration), golangci-lint v1.64.8, gofmt -l — all clean. Live-Postgres -tags=integration (incl. -race) — all green.

🤖 Generated with Claude Code

Move prices off DB tables (model_price + derivation_policy) onto a versioned
operator-authored YAML price file. The rater loads the file at run start,
projects the resolved per-token rates (fine-tune premium applied in exact
decimal) into a transient TEMP table, rates the last complete hour entirely in
SQL, and FREEZES the applied per-token rates onto each rated_usage row so the
row is self-auditing and immutable.

Contracts:
- New operator-facing price file (config/prices.example.yaml): base per-token
  rates keyed on the HF model id, the single global fine-tune premium policy
  (identity|multiplier|markup), and per-GPU floor rates. Rates are exact-decimal
  strings (never float). Loader fails closed on anything malformed.
- rated_usage gains applied_prompt_rate / applied_cached_rate /
  applied_completion_rate NUMERIC(20,9): the exact rate each rollup billed at.
- Dropped model_price + derivation_policy (and the GiST exclusion constraints,
  effective-dating, and the SQL seed). Clean rewrite of the 0002 migration +
  Alembic (no prod data; the Alembic was never applied to saturn).

Keeps the money discipline: NUMERIC throughout, cost computed+summed in SQL,
cached-subset billable-prompt formula, fail-loud-never-$0 (ErrNoPrice / unpriced
count), idempotent deterministic-id upsert, session-TZ-independent bucketing, and
the Rate() oracle + live-Postgres conformance pinning the SQL.

Fine-tune base linkage is a flagged gap: billing_event carries only the engine
model NAME, so an ft:<checkpoint> id prices only if the file declares its
derived_from (or own rate); otherwise it is unpriced (fail loud). Base-direct
models price fully.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@hhuuggoo

Copy link
Copy Markdown
Contributor Author

🔋 Battery review — Round 1 (status: ESCALATE)

Tier-2 money-path review of the YAML-rater rework. 19 raw → 4 refuted → 15 confirmed + 7 persona. Stopped on design/money decisions that must not be auto-patched. Two need Hugo; two are confirm-the-non-goal; the rest are mechanical (held back).

🛑 Money decisions for Hugo

1. Rounding model changed — quantize-then-multiply vs sum-then-round (this PR flipped it). Merged main summed exact per-event products and rounded once. This PR quantizes each per-token rate to 9dp first, then multiplies — because E1's "store the applied rate on the row" requires a 9dp rate that reconstructs the cost. For a fine-tune whose premium yields a sub-nano residue (e.g. 1-nano base × 1.5 = 0.0000000015 → rounds to 0.000000002), quantize-then-multiply bills slightly more. This is a forced consequence of the store-rate-on-row decision — sum-then-round is incompatible with self-auditing rows. The only actual defect: doc.go still documents sum-then-round, so code and stated contract disagree. Decision: ratify quantize-then-multiply as the spec (recommended — it's what self-auditing rows require) and fix the docs + oracle to match; OR keep sum-then-round and drop the rate-on-row guarantee. The former is consistent with your E1 call.

2. Effective-dated pricing removed — confirm the late-arrival semantics. The price table + GiST + effective-dating are gone (intended — prices are YAML now). "Never reprice served traffic" now holds because the row freezes its applied rate, not because price is resolved as-of-event-time. Consequence: a late-arriving event in an already-rated hour, re-rated after a YAML price change, bills at the new rate, not the rate when it was served. Given the rater runs hourly and prices change rarely, the window is tiny — but confirm this is acceptable (it's the same call you reasoned through when choosing rate-on-row).

🛑 Confirm-the-non-goal (self-flagged, not bugs)

  • Fine-tune base-linkage gap — an ft:<checkpoint> whose derived_from isn't in the YAML → ErrNoPrice (fail-loud, correct). The rater structurally can't price the primary fine-tune id format until the metering path plumbs base_model through. Confirm this is an accepted v1 non-goal before merge. (This is the (a)-vs-(b) call already on your plate.)

✅ Mechanical (held back — will fix once decisions land)

  • pricebook.go:362 (CONFIRMED/high): a base rate finer than 9dp is silently rounded — possibly to 0. A price like 0.0000000001 becomes 0.000000000. Needs a load-time guard: reject sub-9dp rates that round to zero (fail-closed on a price the operator clearly meant to be nonzero).
  • The conformance oracle feeds the unquantized rate while production bills the quantized one (latent — current fixture has no residue, but the conformance guard is mis-calibrated for the day one appears). Fix: oracle uses rate.Quantized().
  • event_count is int/INTEGER again (the BIGINT widening didn't carry into the rewrite); two overclaiming test names; dead ErrDerivationChain; stale "pointer-not-copy" comment.

Battery wf_d0f65d1b-dc7, 25 agents, ~1.2M tokens. #14 is NOT merge-ready: it needs decision 1 (rounding spec) + the high-sev sub-9dp-price guard, then a fix pass + re-battery to dry.

@hhuuggoo

Copy link
Copy Markdown
Contributor Author

Landed on main via the squashed rating merge (3b22908). Closing the stack.

@hhuuggoo hhuuggoo closed this Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant