Skip to content

Phoebe: enforce E3 one-hop + ft-uniqueness, derived round-to-zero guard#16

Closed
hhuuggoo wants to merge 2 commits into
yaml-rater-fixesfrom
yaml-rater-fixes-2
Closed

Phoebe: enforce E3 one-hop + ft-uniqueness, derived round-to-zero guard#16
hhuuggoo wants to merge 2 commits into
yaml-rater-fixesfrom
yaml-rater-fixes-2

Conversation

@hhuuggoo

Copy link
Copy Markdown
Contributor

Second fix pass on fine-tune pricing, applying the PR #15 battery findings + Hugo's LOCKED E3 decisions to the derived-rate edges the first pass missed. Money path — every behavioral change is fail-closed and paired with an invariant-named, proven-teeth test (each demonstrated RED before its fix).

Stacked on yaml-rater-fixes.

Contracts

  • Derived-rate fail-closed guard (FIX 1). The round-to-zero guard now covers the DERIVED path, not just file-declared rates. At LOAD, a fine-tune premium that drives a NONZERO base component to a rate that rounds to $0 at 9dp (factor "0", a tiny fractional factor on a small base, a negative-rounding markup) is REJECTED — naming the base model, premium policy, and component. A $0 base component stays a legitimate free rate; only a premium that silently zeroes an intended-nonzero base is caught. Never at bill time. Was: every ft: event of that base silently billed $0, counted as rated → lost revenue.

  • ft-uniqueness invariant, fail-LOUD (FIX 2). E3 mints ft:<checkpoint_artifact_id> as a globally-unique uuid4, so one ft: model_id can never carry two base_models. The rater now ENFORCES this: a rollup with COUNT(DISTINCT base_model) FILTER (WHERE via_derived) > 1 is split out of the upsert and counted as a new ambiguous_base anomaly that drives the exit-nonzero path — never silently billed at the MIN (cheaper) base. The rollup grain is unchanged (no re-key); the violation screams instead of under-billing. The false MIN==MAX comment is corrected to state the enforced invariant. Self-audit test added: cost == applied_rate × tokens per rollup.

  • One-hop rule: an own-rate fine-tune is NOT a derivation base (FIX 3). derivedRates() (the SQL projection) and ResolveEvent() (the oracle) now share isDerivationBase(), which excludes any ft:-prefixed entry or derivedFrom key from being a derivation source. An ft: event whose base_model points at another own-rate fine-tune fails loud (ErrNoPrice) — no fine-tune-of-fine-tune. SQL and oracle agree on the corrected contract (conformance integration test added).

  • Migration-chain ownership of base_model (FIX 4). The rating migration's downgrade() no longer drops billing_event.base_model — that column is owned by the billing_event create migration (b1f0c2d3e4a5). A migration reverses only its own additions; the rating downgrade leaves a schema consistent with b1f0c2d3e4a5. Verified the chain up+down against live Postgres (base_model survives the rating downgrade; only the billing_event downgrade removes it).

  • Single-sourced ft: prefix (FIX 4). The SQL no longer hardcodes a literal 'ft:%'; the LIKE pattern is built from the Go fineTunePrefix constant and bound as $3 — one source of truth for the ft: marker on the money path. Also: the deterministic rollup-id md5 input is now length-prefixed (injective, so a '|' in a field can never collide two keys), and a fail-closed e2e assertion was added for a base_model-absent ft: event (ends UNPRICED, never silently rated).

Tests (all proven RED before fix)

  • TestLoad_DerivedRateRoundsToZeroFailsClosed
  • TestRater_FineTuneAmbiguousBaseModelFailsLoud, TestIntegration_FineTuneAmbiguousBaseModelFailsLoud, TestRater_RollupCostSelfAudits
  • TestResolveEvent_FineTuneCannotDeriveFromFineTune, TestIntegration_OneHopFineTuneCannotDeriveFromFineTune
  • TestE2E_FineTuneWithoutBaseModelHeaderIsUnpriced

Gate

go build, go vet (+-tags=integration), golangci-lint v1.64.8, gofmt -l empty, go test -race ./..., and the live-Postgres -tags=integration suite (rating conformance + e2e) all green.

🤖 Generated with Claude Code

hhuuggoo and others added 2 commits June 15, 2026 20:42
Money-path correctness on the DERIVED fine-tune pricing path, applying the
battery findings + Hugo's locked E3 decisions to the edges the first pass missed.
All behavioral changes pair with an invariant-named, proven-teeth test.

FIX 1 — derived-rate round-to-zero fail-closed guard. The sub-9dp guard only
covered FILE-DECLARED rates; a fine-tune PREMIUM that drives base x premium to $0
(factor "0", a tiny fractional factor on a small base, a negative-rounding markup)
silently billed every ft: event $0, counted as rated. validateDerivedRatesNonZero
now rejects, at LOAD, any premium that zeroes a NONZERO base component (a $0 base
component stays a legitimate free rate). Never at bill time.
Test: TestLoad_DerivedRateRoundsToZeroFailsClosed.

FIX 2 — enforce E3's "ft: ids are globally unique" as a fail-LOUD invariant (not
a re-key). rated_usage's key omits base_model, so two ft: deployments with the same
model_id but different bases would merge and bill at the MIN (cheaper) rate. E3
mints ft:<checkpoint_artifact_id> as a uuid4, so this can't happen — the rater now
DETECTS COUNT(DISTINCT base_model) FILTER (WHERE via_derived) > 1 per rollup, splits
that rollup out of the upsert, and counts it as a new ambiguous_base anomaly that
drives exit-nonzero. Converts "merges at cheapest rate" into "screams". Fixes the
false MIN==MAX comment to state the enforced invariant.
Tests: TestRater_FineTuneAmbiguousBaseModelFailsLoud,
TestIntegration_FineTuneAmbiguousBaseModelFailsLoud, TestRater_RollupCostSelfAudits
(cost == applied_rate x tokens per rollup).

FIX 3 — enforce E3's ONE-HOP rule: an own-rate fine-tune is NOT a derivation base.
derivedRates() over-projected every pb.base entry (incl. ft: own-rate fine-tunes)
as a derivation source, letting a fine-tune derive from another fine-tune's own
rate x premium (multi-hop). isDerivationBase() now excludes ft:-prefixed entries
and derivedFrom keys; ResolveEvent (the oracle) mirrors it, so SQL and oracle agree.
Tests: TestResolveEvent_FineTuneCannotDeriveFromFineTune,
TestIntegration_OneHopFineTuneCannotDeriveFromFineTune.

Also single-sources the ft: LIKE pattern from the Go fineTunePrefix constant
(bound as $3, no literal 'ft:%' in SQL) and makes the deterministic rollup-id md5
input length-prefixed (injective — a '|' in auth_id/model_id can never collide two
keys), both inseparable from the rewritten rating statement.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… assert

Migration-chain ownership: the rating migration's downgrade() dropped
billing_event.base_model, but that column is OWNED by the billing_event create
migration (b1f0c2d3e4a5), which declares it directly. A migration must reverse only
its OWN additions — dropping base_model on the rating downgrade left the schema
diverged from b1f0c2d3e4a5 (column gone while its owning migration is still applied)
and would silently destroy fine-tune base linkage. The rating downgrade no longer
touches base_model; it is removed only by reversing billing_event's own create.
Verified the chain up+down is consistent against live Postgres (base_model survives
the rating downgrade; only the billing_event downgrade removes it).

e2e fail-closed assertion: TestE2E_FineTuneWithoutBaseModelHeaderIsUnpriced drives
the whole pipe with a fine-tune request that arrives WITHOUT the X-Saturn-Base-Model
header (the propagation seam broken) and asserts the ft: event ends UNPRICED — an
anomaly that screams — never silently rated or $0-billed. The mirror of the
happy-path TestE2E_FineTuneBillsAtBaseTimesPremium.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@hhuuggoo

Copy link
Copy Markdown
Contributor Author

🔋 Re-battery — Round 1 (status: ESCALATE), WITH decisions-ledger

Fewer findings (11 raw vs 19 last round — the ledger correctly suppressed settled E1–E4 items). What's left is genuinely new. Most are mechanical (I'll fix); two are a coherent pair of re-rate-semantics design calls for Hugo.

✅ Mechanical (I'll fix — no decision)

  • HIGH: the one-hop loader gap (root of two findings). The last pass enforced one-hop on the event-carried path (isDerivationBase) but NOT on the file-declared derived_from path (loader, pricebook.go:219). So ft:child derived_from: ft:ownrate (a fine-tune deriving from a fine-tune via YAML) is accepted, and with a small premium bills $0 — the derived round-to-zero guard also doesn't cover this key. Fix: call isDerivationBase in the loader check (the comment already claims it does this). Closes both the one-hop gap and the round-to-zero coverage hole.
  • Migration downgrade DROP INDEX without IF EXISTS (asymmetric with the idempotent upgrade); two overclaiming test docstrings (RollupCostSelfAudits claims "no MIN() masking" but the fixture doesn't exercise divergent rates).

🛑 Two design calls for Hugo — both about re-rate semantics (new; not ledger-settled)

The rater is hourly and can be re-run on an hour. These two ask what re-rating should do — a question we haven't pinned:

1. Should re-rate CONVERGE state, or only ever upsert? If a rollup billed clean in run A, then on re-run B becomes ambiguous/unpriced (e.g. new data arrived, or a price changed), it's excluded from the upsert — but the old rated_usage row is never deleted, so it keeps billing at its stale cost while the run merely exits anomaly. Current model is upsert-only (never deletes). Decide: should re-rate reconcile (delete rollups that fell out of "priced"), making "what the latest run says is what bills"? Or is upsert-only intended and stale rows are handled elsewhere (e.g. the Atlas consumer reads only the latest run)? This sets the idempotency contract.

2. Mixed known+unknown base under one ft: id — poison the whole id, or bill the known part? The ambiguity gate counts only derivable bases. An ft: id appearing with one known base and one unknown (header-absent) base in a window isn't flagged ambiguous — the known-base tokens bill, the unknown row lands in unpriced. Decide the fail-closed boundary: does any base-ambiguity on an ft: id poison the entire rollup (strict E3 uniqueness — nothing bills until it's clean), or is billing the known portion + flagging the unknown acceptable?

My leans: (1) reconcile/delete-on-re-rate (matches "what loads is what bills" and your self-auditing-row instinct); (2) strict-poison the whole id (consistent with the fail-loud uniqueness enforcement you already chose). But both change the idempotency/fail-closed contract, so they're yours.


Battery wf_1a70e363-17e. The ledger feature worked. #16 needs the mechanical fixes (I'll do) + decisions 1–2, then a final dry round.

@hhuuggoo

Copy link
Copy Markdown
Contributor Author

Landed on main via the squashed rating merge (3b22908). Closing the stack.

@hhuuggoo hhuuggoo closed this Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant