Skip to content

Phoebe: finish reconcile observability (loud-log half) + real index EXPLAIN#19

Merged
hhuuggoo merged 1 commit into
yaml-rater-fixes-4from
yaml-rater-fixes-5
Jun 16, 2026
Merged

Phoebe: finish reconcile observability (loud-log half) + real index EXPLAIN#19
hhuuggoo merged 1 commit into
yaml-rater-fixes-4from
yaml-rater-fixes-5

Conversation

@hhuuggoo

Copy link
Copy Markdown
Contributor

Stacked on yaml-rater-fixes-4. Finishes the LOG side of the ratified reconcile observability contract (option (c)); the exit-code side was already done in exitCode().

Contracts

Reconcile-delete log-severity contract is now FULLY implemented, matching the exit-code half:

Run kind window reconcile-delete log exit code
ROUTINE (default trailing-hours, !windowExplicit) not operator-chosen ERROR (page) 2
EXPLICIT backfill (--since/--until, windowExplicit) operator-chosen INFO (intended convergence) 0

The reconcile SEMANTICS are unchanged either way ("what the latest run says is what bills" — store.go always reconciles). Only the log severity (here) and the exit code (cmd/rater, already shipped) turn on windowExplicit. The flag is threaded into Rater.Run rather than recomputed, because only cmd/rater knows how the window was chosen — mirroring the exit-code gate and keeping rating observability in the rating package.

Fixes

FIX 1 — Rater.Run takes windowExplicit; reconcile-delete logs ERROR on routine, INFO on backfill. A routine run rewriting a prior bill with no operator behind it means events vanished from billing_event (data loss) or an upstream regression dropped them — so it is ERROR (page). An explicit backfill that deletes the same row is intended convergence — INFO. Deletion count + window appear in both. All Run call sites updated (routine cron path passes false).

FIX 2 — test the loud-log half.

  • TestRater_RoutineReconcileDeleteLogsError — a routine reconcile-delete with NO other anomaly (HasAnomaly() stays false) emits an ERROR line. Captures the ERROR/INFO streams into buffers. Demonstrated RED against the pre-fix always-INFO code (empty ERROR stream → assertion fails).
  • TestRater_BackfillReconcileDeleteLogsInfoNoError — the same delete under an explicit backfill logs INFO and emits NO ERROR.
  • Fixed the over-claiming docstring on TestRater_RoutineRunReconcileDeleteExitsNonzero (it pins ONLY the exit code; the log half is now separately tested).

FIX 3 — make the reconcile-DELETE index EXPLAIN a real proof. It previously EXPLAINed a standalone SELECT 1 against an EMPTY rated_usage, where the planner seqscans regardless — vacuous. Now it populates rated_usage (50 auths × 200 hours = 10k rows), ANALYZEs, and EXPLAINs the ACTUAL reconcile DELETE (the deleted CTE shape: window_start range + NOT EXISTS anti-join), asserting the plan chooses rated_usage_window_start_ix at DEFAULT cost (seqscan enabled). Verified discriminating: dropping the index makes the plan fall back to the far costlier auth-leading composite (~280 vs ~8 cost), so the assertion fails without the index — not vacuous. A seqscan-off pass follows as belt-and-braces.

Gate

go build ./..., go vet ./... (+ -tags=integration), golangci-lint v1.64.8, gofmt -l . — all clean. go test -race ./... and PHOEBE_TEST_DATABASE_URL=... go test -tags=integration -race ./... (live PG 16) — all pass, including the e2e suite and the rebuilt index EXPLAIN test (ran, not skipped).

🤖 Generated with Claude Code

…XPLAIN

Completes the ratified reconcile observability contract (option (c)) on the
LOG side; the exit-code half (exitCode() in cmd/rater) was already done.

FIX 1 — thread windowExplicit into Rater.Run and gate the reconcile-delete log
severity on it. A reconcile-DELETE of a previously-billed rated_usage row is now
LOUD on a ROUTINE run (default trailing-hours window, !windowExplicit) — ERROR,
because rewriting a prior bill with no operator behind it means events vanished
from billing_event (data loss) or an upstream regression dropped them — and QUIET
on an EXPLICIT backfill (--since/--until) — INFO, intended convergence. This
mirrors the exit-code gate; the reconcile SEMANTICS are unchanged. All Run call
sites updated to pass windowExplicit (routine cron path passes false).

FIX 2 — test the loud-log half. TestRater_RoutineReconcileDeleteLogsError pins
that a routine reconcile-delete (no other anomaly, HasAnomaly stays false) emits
an ERROR line; TestRater_BackfillReconcileDeleteLogsInfoNoError pins that the same
delete under an explicit backfill logs INFO and NO ERROR. Both capture the INFO
and ERROR streams into buffers. The routine-ERROR assertion was demonstrated RED
against the pre-fix always-INFO code. Fixed the over-claiming docstring on
TestRater_RoutineRunReconcileDeleteExitsNonzero to state it pins ONLY the exit
code, now that the log half is separately tested.

FIX 3 — make the reconcile-DELETE index EXPLAIN test a real proof. It previously
EXPLAINed a standalone `SELECT 1` against an empty rated_usage, where the planner
seqscans regardless. Now it populates rated_usage (50 auths x 200 hours = 10k
rows), ANALYZEs, and EXPLAINs the ACTUAL reconcile DELETE (the `deleted` CTE
shape: window_start range + NOT EXISTS anti-join), asserting the plan chooses
rated_usage_window_start_ix at DEFAULT cost (seqscan enabled). Verified
discriminating: dropping the index makes the plan fall back to the far costlier
auth-leading composite (~280 vs ~8), so the assertion fails without the index.

Gate: go build, go vet (+integration), golangci-lint v1.64.8, gofmt all clean;
go test -race ./... and -tags=integration -race ./... against live PG all pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@hhuuggoo hhuuggoo merged commit 301e9f6 into yaml-rater-fixes-4 Jun 16, 2026
3 checks passed
@hhuuggoo hhuuggoo deleted the yaml-rater-fixes-5 branch June 16, 2026 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant