Phoebe: enforce E3 one-hop + ft-uniqueness, derived round-to-zero guard by hhuuggoo · Pull Request #16 · saturncloud/phoebe

hhuuggoo · 2026-06-15T20:43:37Z

Second fix pass on fine-tune pricing, applying the PR #15 battery findings + Hugo's LOCKED E3 decisions to the derived-rate edges the first pass missed. Money path — every behavioral change is fail-closed and paired with an invariant-named, proven-teeth test (each demonstrated RED before its fix).

Stacked on yaml-rater-fixes.

Contracts

Derived-rate fail-closed guard (FIX 1). The round-to-zero guard now covers the DERIVED path, not just file-declared rates. At LOAD, a fine-tune premium that drives a NONZERO base component to a rate that rounds to $0 at 9dp (factor "0", a tiny fractional factor on a small base, a negative-rounding markup) is REJECTED — naming the base model, premium policy, and component. A $0 base component stays a legitimate free rate; only a premium that silently zeroes an intended-nonzero base is caught. Never at bill time. Was: every ft: event of that base silently billed $0, counted as rated → lost revenue.
ft-uniqueness invariant, fail-LOUD (FIX 2). E3 mints ft:<checkpoint_artifact_id> as a globally-unique uuid4, so one ft: model_id can never carry two base_models. The rater now ENFORCES this: a rollup with COUNT(DISTINCT base_model) FILTER (WHERE via_derived) > 1 is split out of the upsert and counted as a new ambiguous_base anomaly that drives the exit-nonzero path — never silently billed at the MIN (cheaper) base. The rollup grain is unchanged (no re-key); the violation screams instead of under-billing. The false MIN==MAX comment is corrected to state the enforced invariant. Self-audit test added: cost == applied_rate × tokens per rollup.
One-hop rule: an own-rate fine-tune is NOT a derivation base (FIX 3). derivedRates() (the SQL projection) and ResolveEvent() (the oracle) now share isDerivationBase(), which excludes any ft:-prefixed entry or derivedFrom key from being a derivation source. An ft: event whose base_model points at another own-rate fine-tune fails loud (ErrNoPrice) — no fine-tune-of-fine-tune. SQL and oracle agree on the corrected contract (conformance integration test added).
Migration-chain ownership of base_model (FIX 4). The rating migration's downgrade() no longer drops billing_event.base_model — that column is owned by the billing_event create migration (b1f0c2d3e4a5). A migration reverses only its own additions; the rating downgrade leaves a schema consistent with b1f0c2d3e4a5. Verified the chain up+down against live Postgres (base_model survives the rating downgrade; only the billing_event downgrade removes it).
Single-sourced ft: prefix (FIX 4). The SQL no longer hardcodes a literal 'ft:%'; the LIKE pattern is built from the Go fineTunePrefix constant and bound as $3 — one source of truth for the ft: marker on the money path. Also: the deterministic rollup-id md5 input is now length-prefixed (injective, so a '|' in a field can never collide two keys), and a fail-closed e2e assertion was added for a base_model-absent ft: event (ends UNPRICED, never silently rated).

Tests (all proven RED before fix)

TestLoad_DerivedRateRoundsToZeroFailsClosed
TestRater_FineTuneAmbiguousBaseModelFailsLoud, TestIntegration_FineTuneAmbiguousBaseModelFailsLoud, TestRater_RollupCostSelfAudits
TestResolveEvent_FineTuneCannotDeriveFromFineTune, TestIntegration_OneHopFineTuneCannotDeriveFromFineTune
TestE2E_FineTuneWithoutBaseModelHeaderIsUnpriced

Gate

go build, go vet (+-tags=integration), golangci-lint v1.64.8, gofmt -l empty, go test -race ./..., and the live-Postgres -tags=integration suite (rating conformance + e2e) all green.

🤖 Generated with Claude Code

Money-path correctness on the DERIVED fine-tune pricing path, applying the battery findings + Hugo's locked E3 decisions to the edges the first pass missed. All behavioral changes pair with an invariant-named, proven-teeth test. FIX 1 — derived-rate round-to-zero fail-closed guard. The sub-9dp guard only covered FILE-DECLARED rates; a fine-tune PREMIUM that drives base x premium to $0 (factor "0", a tiny fractional factor on a small base, a negative-rounding markup) silently billed every ft: event $0, counted as rated. validateDerivedRatesNonZero now rejects, at LOAD, any premium that zeroes a NONZERO base component (a $0 base component stays a legitimate free rate). Never at bill time. Test: TestLoad_DerivedRateRoundsToZeroFailsClosed. FIX 2 — enforce E3's "ft: ids are globally unique" as a fail-LOUD invariant (not a re-key). rated_usage's key omits base_model, so two ft: deployments with the same model_id but different bases would merge and bill at the MIN (cheaper) rate. E3 mints ft:<checkpoint_artifact_id> as a uuid4, so this can't happen — the rater now DETECTS COUNT(DISTINCT base_model) FILTER (WHERE via_derived) > 1 per rollup, splits that rollup out of the upsert, and counts it as a new ambiguous_base anomaly that drives exit-nonzero. Converts "merges at cheapest rate" into "screams". Fixes the false MIN==MAX comment to state the enforced invariant. Tests: TestRater_FineTuneAmbiguousBaseModelFailsLoud, TestIntegration_FineTuneAmbiguousBaseModelFailsLoud, TestRater_RollupCostSelfAudits (cost == applied_rate x tokens per rollup). FIX 3 — enforce E3's ONE-HOP rule: an own-rate fine-tune is NOT a derivation base. derivedRates() over-projected every pb.base entry (incl. ft: own-rate fine-tunes) as a derivation source, letting a fine-tune derive from another fine-tune's own rate x premium (multi-hop). isDerivationBase() now excludes ft:-prefixed entries and derivedFrom keys; ResolveEvent (the oracle) mirrors it, so SQL and oracle agree. Tests: TestResolveEvent_FineTuneCannotDeriveFromFineTune, TestIntegration_OneHopFineTuneCannotDeriveFromFineTune. Also single-sources the ft: LIKE pattern from the Go fineTunePrefix constant (bound as $3, no literal 'ft:%' in SQL) and makes the deterministic rollup-id md5 input length-prefixed (injective — a '|' in auth_id/model_id can never collide two keys), both inseparable from the rewritten rating statement. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… assert Migration-chain ownership: the rating migration's downgrade() dropped billing_event.base_model, but that column is OWNED by the billing_event create migration (b1f0c2d3e4a5), which declares it directly. A migration must reverse only its OWN additions — dropping base_model on the rating downgrade left the schema diverged from b1f0c2d3e4a5 (column gone while its owning migration is still applied) and would silently destroy fine-tune base linkage. The rating downgrade no longer touches base_model; it is removed only by reversing billing_event's own create. Verified the chain up+down is consistent against live Postgres (base_model survives the rating downgrade; only the billing_event downgrade removes it). e2e fail-closed assertion: TestE2E_FineTuneWithoutBaseModelHeaderIsUnpriced drives the whole pipe with a fine-tune request that arrives WITHOUT the X-Saturn-Base-Model header (the propagation seam broken) and asserts the ft: event ends UNPRICED — an anomaly that screams — never silently rated or $0-billed. The mirror of the happy-path TestE2E_FineTuneBillsAtBaseTimesPremium. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

hhuuggoo · 2026-06-15T20:54:41Z

🔋 Re-battery — Round 1 (status: ESCALATE), WITH decisions-ledger

Fewer findings (11 raw vs 19 last round — the ledger correctly suppressed settled E1–E4 items). What's left is genuinely new. Most are mechanical (I'll fix); two are a coherent pair of re-rate-semantics design calls for Hugo.

✅ Mechanical (I'll fix — no decision)

HIGH: the one-hop loader gap (root of two findings). The last pass enforced one-hop on the event-carried path (isDerivationBase) but NOT on the file-declared derived_from path (loader, pricebook.go:219). So ft:child derived_from: ft:ownrate (a fine-tune deriving from a fine-tune via YAML) is accepted, and with a small premium bills $0 — the derived round-to-zero guard also doesn't cover this key. Fix: call isDerivationBase in the loader check (the comment already claims it does this). Closes both the one-hop gap and the round-to-zero coverage hole.
Migration downgrade DROP INDEX without IF EXISTS (asymmetric with the idempotent upgrade); two overclaiming test docstrings (RollupCostSelfAudits claims "no MIN() masking" but the fixture doesn't exercise divergent rates).

🛑 Two design calls for Hugo — both about re-rate semantics (new; not ledger-settled)

The rater is hourly and can be re-run on an hour. These two ask what re-rating should do — a question we haven't pinned:

1. Should re-rate CONVERGE state, or only ever upsert? If a rollup billed clean in run A, then on re-run B becomes ambiguous/unpriced (e.g. new data arrived, or a price changed), it's excluded from the upsert — but the old rated_usage row is never deleted, so it keeps billing at its stale cost while the run merely exits anomaly. Current model is upsert-only (never deletes). Decide: should re-rate reconcile (delete rollups that fell out of "priced"), making "what the latest run says is what bills"? Or is upsert-only intended and stale rows are handled elsewhere (e.g. the Atlas consumer reads only the latest run)? This sets the idempotency contract.

2. Mixed known+unknown base under one ft: id — poison the whole id, or bill the known part? The ambiguity gate counts only derivable bases. An ft: id appearing with one known base and one unknown (header-absent) base in a window isn't flagged ambiguous — the known-base tokens bill, the unknown row lands in unpriced. Decide the fail-closed boundary: does any base-ambiguity on an ft: id poison the entire rollup (strict E3 uniqueness — nothing bills until it's clean), or is billing the known portion + flagging the unknown acceptable?

My leans: (1) reconcile/delete-on-re-rate (matches "what loads is what bills" and your self-auditing-row instinct); (2) strict-poison the whole id (consistent with the fail-loud uniqueness enforcement you already chose). But both change the idempotency/fail-closed contract, so they're yours.

Battery wf_1a70e363-17e. The ledger feature worked. #16 needs the mechanical fixes (I'll do) + decisions 1–2, then a final dry round.

hhuuggoo · 2026-06-16T01:03:07Z

Landed on main via the squashed rating merge (3b22908). Closing the stack.

hhuuggoo and others added 2 commits June 15, 2026 20:42

hhuuggoo mentioned this pull request Jun 15, 2026

Phoebe: one-hop loader gap + reconcile-on-re-rate (Hugo decisions) #17

Closed

hhuuggoo closed this Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phoebe: enforce E3 one-hop + ft-uniqueness, derived round-to-zero guard#16

Phoebe: enforce E3 one-hop + ft-uniqueness, derived round-to-zero guard#16
hhuuggoo wants to merge 2 commits into
yaml-rater-fixesfrom
yaml-rater-fixes-2

hhuuggoo commented Jun 15, 2026

Uh oh!

hhuuggoo commented Jun 15, 2026

Uh oh!

hhuuggoo commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hhuuggoo commented Jun 15, 2026

Contracts

Tests (all proven RED before fix)

Gate

Uh oh!

hhuuggoo commented Jun 15, 2026

🔋 Re-battery — Round 1 (status: ESCALATE), WITH decisions-ledger

✅ Mechanical (I'll fix — no decision)

🛑 Two design calls for Hugo — both about re-rate semantics (new; not ledger-settled)

Uh oh!

hhuuggoo commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant