Skip to content

Facts-only store: type source-published projections as facts, enforce period contracts at consumption #71

Description

@MaxGhenis

What

Keep Ledger a facts-only store, and make the schema enforce it. The boundary is who asserted the value, not "level vs projection":

  • Facts — immutable, as-published claims with lineage to source cells. This includes sources' own projections: "CBO's January 2026 baseline projects individual income tax receipts of $X in 2027" is a source-backed claim with provenance, exactly like an SOI observation. Type these with an explicit assertion field (observation vs source_projection) so consumers can distinguish measured outcomes from publisher forecasts (CBO baselines, BFP outlooks, SSA trustees tables, TPC/JCT scores).
  • PolicyEngine-computed values — aged, uprated, or forecast levels we derive (e.g., SOI TY2022 aged to TY2025 under CBO factors) — never enter the store. They are regenerable build artifacts, not ledger entries, and they live in Populace as a named, versioned aging implementation that consumes growth-factor facts from Ledger and emits its own lineage.

What Ledger contributes instead of projection objects:

  1. Rigorous period semantics on every fact — the period a value refers to, distinct from the release vintage/label. This covers BE-SILC lagged incomes and heterogeneous Statbel/BFP reference periods without any projection machinery.
  2. Period-contract enforcement at consumption — resolving a profile target at a period other than the fact's reference period hard-fails unless the consumer passes an explicit, named alignment declaration (model id + version + parameters). Ledger records the declaration in diagnostics; it never computes the aligned number.
  3. Basis-aware diagnostics — resolution rows carry fact_period, requested_period, and the declared alignment, so downstream diagnostics distinguish "missed a published fact" from "missed an aged level."

Why this framing (not facts + projections in one schema)

  • Thesis stays clean. Thesis resolves forecasts against Ledger facts as official observations. If the store contains PolicyEngine-computed projections, a forecast can end up scored against partly-model output — circular. A facts-only ledger is a model-free resolution substrate.
  • Append-only stays meaningful. Facts never churn; PE-computed projections churn every CBO update and every aging-model version bump. Storing them alongside facts turns an auditable ledger into a store of volatile derivations wearing provenance costumes.
  • It's the existing boundary. docs/architecture.md and AGENTS.md already assign aging to Populace and forbid "derived facts whose source is Ledger itself." The gap is schema + enforcement, not policy.
  • The populace#212 lesson, re-read: the failure wasn't that aging lived in Populace — it's that un-aged consumption was silent. Calibration was exact against SOI TY2022/23 levels applied at 2024 while simulated 2025 aggregates ran ~6–10% under current-year projections. The fix is making that impossible to do silently, which is a consumption-contract property, not a projection-object property.
  • The shared-implementation argument dissolves. Thesis doesn't want PE-computed aging (its forecasts live in brier; its resolutions need raw facts). Validation comparators (TPC ~$130B FY26, JCT scores) are source-published → facts with assertion: source_projection. The only consumer of PE-computed aged values is Populace calibration, so that's where the code belongs (Period-aware transformation needed for SOI EITC-by-AGI calibration targets populace#116 is the concrete case).

Scope

  • Schema: assertion: observation | source_projection on aggregate facts (key-stable: only non-default values enter canonical key payloads); explicit period-coverage metadata (reference period start/end, basis, source period label, accounting basis) as non-identity provenance — absorbing the design from Add source period metadata to Arch facts #50.
  • Consumer contract: resolution API that enforces the period contract and emits basis-aware diagnostics; consumer rows expose assertion explicitly.
  • Governance: boundary checks reject PE-computed values in facts; AGENTS.md/governance doc updated; ADR recording the decision.
  • Populace: versioned aging library consuming growth-factor facts via the consumer artifact (Publish Ledger target profile artifacts for Populace builds #61), declaring alignments at resolution time (tracked populace-side; see populace#116, populace#212).
  • Geography vintage translation (populace#205) follows the same pattern: a declared consumer-side transform over facts, never an edit to them.

Acceptance

  • CBO/BFP/TPC-style source projections representable as facts with assertion: source_projection, full lineage, byte-stable keys for all existing facts.
  • Resolving a dollar target at a period ≠ the fact's reference period without a declared alignment raises; with a declaration, resolution rows carry basis + alignment metadata end to end.
  • Facts remain byte-identical before/after; no object in the store carries a PolicyEngine-computed value.
  • Diagnostics distinguish "missed a published fact" from "missed an aged level."

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions