Skip to content

Phoebe: one-hop loader gap + reconcile-on-re-rate (Hugo decisions)#17

Closed
hhuuggoo wants to merge 3 commits into
yaml-rater-fixes-2from
yaml-rater-fixes-3
Closed

Phoebe: one-hop loader gap + reconcile-on-re-rate (Hugo decisions)#17
hhuuggoo wants to merge 3 commits into
yaml-rater-fixes-2from
yaml-rater-fixes-3

Conversation

@hhuuggoo

Copy link
Copy Markdown
Contributor

Fix pass on the fine-tune rating money path, implementing the latest #16 battery findings and Hugo's re-rate decision (decisions ledger, "Re-rate semantics"). Stacked on yaml-rater-fixes-2. Money path — every behavioral change pairs with an invariant-named test (RED demonstrated before each fix), full gate incl. live-PG integration + e2e is green.

Contracts

  • Loader one-hop enforcement (closes the $0-derived-fine-tune hole). E3's "one hop only" was enforced on the event-carried path (isDerivationBase in derivedRates/ResolveEvent) but NOT on the file-declared derived_from path. The loader only checked the target was present in pb.base — but an own-rate fine-tune IS in pb.base, so ft:child derived_from: ft:ownrate was accepted, and with a small premium billed $0 (never caught by the round-to-zero guard, which iterates only true bases). The loader now enforces isDerivationBase on every derived_from target. Closing this ALSO closes the round-to-zero coverage hole: a file-declared derived_from can now only point at a true base, so its derived projection is covered by validateDerivedRatesNonZero.

  • Re-rate RECONCILES (delete superseded rollups), not upsert-only — Hugo's decision. A re-run of a window now converges rated_usage to exactly the latest run's output. A deleted CTE removes any in-window rollup the run no longer produces (became ambiguous/unpriced, or its events vanished), atomically with the upsert in the same snapshot — so a rollup that billed clean in a prior run cannot keep billing at its stale cost. "What the latest run says is what bills." Window predicate is half-open + hour-scoped (cmd/rater enforces hour-aligned bounds), so adjacent windows are never touched; a clean identical re-run is a no-op. Surfaced as RateResult.ReconciledDeletions, logged at INFO. This is an idempotency-contract change (was upsert-only); doc.go, rater.go, and store.go updated to state it. Not a reprice of served traffic — the applied rate is still frozen per row; what changes on re-rate is which rollups exist.

  • Migration downgrade symmetry. c2f1a3b4d5e6 downgrade dropped billing_event_rating_instant_ix without IF EXISTS (asymmetric with the idempotent upgrade). Now DROP INDEX IF EXISTS; verified up/down/up clean against live PG (and confirmed the old form errors on a second drop).

Fixes + tests

FIX 1 — loader one-hop gap (pricebook.go)

  • TestLoad_FineTuneDerivingFromFineTuneRejected — ft:child derived_from an own-rate ft: (and a derived ft:) → load error; true-base control loads. RED before.
  • TestLoad_DerivedRateRoundsToZeroFailsClosed — added a derived_from-a-true-base case whose premium zeroes the base → rejected.

FIX 2 — reconcile-on-re-rate (store.go, rater.go, oracle in rater_test.go)

  • TestRater_ReRateDeletesSupersededRollup — A bills clean; mutate → B ambiguous; old row GONE, deletions=1. RED before.
  • TestRater_ReRateSupersedesOneKeepsAnotherSameWindow — in one window, a superseded rollup is deleted while a co-window survivor is updated (disjoint delete/upsert sets).
  • TestRater_ReRateDeletesVanishedRollup — events vanish on re-rate → deleted.
  • TestRater_ReRateIdenticalDataIsNoOp — identical re-run deletes nothing; an adjacent-window rollup survives a scoped re-rate.
  • TestIntegration_ReRateReconciles (live PG) — the deleted CTE fires atomically with the upsert; co-window survivor untouched; identical re-run no-op.
  • TestIntegration_ReRateReconcileLeavesOtherWindowsUntouched (live PG) — window predicate doesn't delete out-of-scope billing.

FIX 3 — mechanical (c2f1a3b4d5e6_add_rating.py, rater_test.go)

  • Migration downgrade IF EXISTS (above).
  • TestRater_RollupCostSelfAudits docstring reworded to what it tests (cost reconstructs from each row's own frozen rate, across heterogeneous base AND derived rollups); the MIN-masking concern is pointed to the ambiguity tests, where the >1-base case is actually excluded from billing — a written rollup is single-rate by construction.
  • TestRater_FineTuneAmbiguousBaseModelFailsLoud — gave ft:clean two same-base events so the "same ft: id with the same base across events is fine" claim is actually exercised (gate keys on COUNT(DISTINCT base_model), not event count: 2 events → 1 rollup).

Constraints honored

  • No float for money; oracle mirrors production (oracle now reconciles too). New guards have proven-teeth (RED-demonstrated) tests.
  • Did not touch WAL/abort/drainer; no base_model re-plumbing.
  • Mixed/unknown-base under one ft: id kept as the fail-loud AmbiguousBaseEvents tripwire — no "bill the known portion" logic added (can't-happen per the ledger).

🤖 Generated with Claude Code

hhuuggoo and others added 3 commits June 15, 2026 22:43
The last pass enforced E3's one-hop rule on the EVENT-carried path
(isDerivationBase in derivedRates/ResolveEvent) but NOT on the
FILE-declared derived_from path. The loader only checked the derived_from
target was PRESENT in pb.base — but an own-rate fine-tune IS in pb.base, so
`ft:child derived_from: ft:ownrate` (a fine-tune deriving from a fine-tune)
was ACCEPTED. With a small premium that derived rate bills $0, and the
round-to-zero guard (validateDerivedRatesNonZero) iterates only TRUE bases,
so it never covered that key — a silent free fine-tune.

Enforce isDerivationBase (the one-hop predicate already shared by the SQL
projection and the oracle) in the loader's derived_from validation. This
closes both the one-hop hole AND the $0-derived coverage hole: a
file-declared derived_from can now only point at a true base, so its
derived projection IS covered by validateDerivedRatesNonZero.

Tests (RED before, GREEN after):
- TestLoad_FineTuneDerivingFromFineTuneRejected: ft:child derived_from an
  own-rate ft: (and a derived ft:) -> load error; true-base control loads.
- TestLoad_DerivedRateRoundsToZeroFailsClosed: + a derived_from-a-true-base
  case whose premium zeroes the base -> rejected (the round-to-zero path).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Hugo's decision (decisions ledger, 'Re-rate semantics'): when the hourly
rater re-runs a window, a rollup that billed CLEAN in a prior run but is now
excluded from priced (became ambiguous-base or unpriced, or its events
vanished) must have its stale rated_usage row DELETED — 'what the latest run
says is what bills.' Previously upsert-only: the stale row kept billing at
its old cost forever.

Add a `deleted` CTE to the single rating statement: after computing priced,
DELETE every rated_usage row whose (auth_id, model_id, window_start) is IN
the rated window [$1,$2) but NOT reproduced in priced (NOT EXISTS), atomic
with the upsert in the same snapshot. The delete-set and upsert-set are
disjoint rows, so Postgres runs both modifying CTEs against one table cleanly.
Window predicate is half-open + hour-scoped: bounds are hour-aligned
(cmd/rater enforces), so an adjacent window's rollups are never touched. A
clean identical re-run reproduces every prior row -> delete matches nothing
-> no-op. Surfaced as RateResult.ReconciledDeletions and logged at INFO.

The oracle store now reconciles too (mirrors production), so the unit tests
exercise the real convergence. doc.go updated: idempotency model is now
reconcile, not upsert-only.

Tests (RED before, GREEN after):
- TestRater_ReRateDeletesSupersededRollup: A bills clean; mutate -> B
  ambiguous; old row GONE, deletions=1.
- TestRater_ReRateDeletesVanishedRollup: events vanish on re-rate -> deleted.
- TestRater_ReRateIdenticalDataIsNoOp: identical re-run deletes nothing; an
  adjacent-window rollup survives a scoped re-rate.
- TestIntegration_ReRateReconciles (live PG): the deleted CTE fires
  atomically with the upsert.
- TestIntegration_ReRateReconcileLeavesOtherWindowsUntouched (live PG):
  window predicate doesn't delete out-of-scope billing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…trings

Mechanical FIX 3 items from the battery:

- Migration c2f1a3b4d5e6 downgrade was asymmetric with its idempotent
  upgrade: it re-adds base_model with ADD COLUMN IF NOT EXISTS, but dropped
  billing_event_rating_instant_ix WITHOUT IF EXISTS, so a re-run / partially-
  applied downgrade could error. Make it DROP INDEX IF EXISTS. Verified
  up/down/up clean against live PG (and confirmed the old form errors on a
  second drop).

- TestRater_RollupCostSelfAudits claimed 'no MIN() masking a divergent rate'
  but the fixture never exercises a divergent rate within a rollup (it can't:
  a written rollup is single-model and, by the ambiguity gate, single-base).
  Reword the docstring to what it tests (cost reconstructs from each row's
  OWN frozen rate, across heterogeneous base AND derived rollups) and point
  the MIN-masking concern to the ambiguity tests where the >1-base case is
  actually excluded from billing.

- TestRater_FineTuneAmbiguousBaseModelFailsLoud's docstring claimed 'the same
  ft: id with the SAME base across events is fine' but ft:clean had a single
  event. Give it TWO same-base events so the legitimate case is actually
  exercised: the gate keys on COUNT(DISTINCT base_model), not event count, so
  a same-ft-same-base multi-event rollup rates as one (2 events, 1 rollup).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@hhuuggoo

Copy link
Copy Markdown
Contributor Author

Landed on main via the squashed rating merge (3b22908). Closing the stack.

@hhuuggoo hhuuggoo closed this Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant