Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .console/backlog.md
Original file line number Diff line number Diff line change
Expand Up @@ -1292,3 +1292,15 @@ tries, never what counts as resolved.
concern in a single cycle) instead of the across-cycles narrowing.
- ⏳ Possible refinement: a "stronger model/effort" ladder rung (needs a model
override plumbed through worker.main → execute.main).

**Ecosystem incomplete-integration remediation — COMPLETE (2026-06-18)**:
14 green-gated PRs. Cross-repo audit found the #313 claimed-complete-but-inert
pattern is NOT systemic — only OC's observer plane had it. Backbone: Custodian #46
(gate self-verifying). WIRED genuine features: DAGExecutor #10, SwitchBoard #21
(p95 demote), PlatformManifest #83 (visibility_scope fail-closed), and the OC
observer plane — #325 (FlakyTestReporter), #326 (Coverage trend/alerts), #327
(merge-decision metrics). Deleted only superseded duplicates (Custodian #47,
DAGExecutor #11, SourceRegistry #14, TeamExecutor #12, CoreRunner #20).
ContextLifecycle = KEEP. Each observer wire pruned its names from d12_baseline
(D12 gate confirms 0). B2 root cause = content-less secret artifact (infra, not a
code bug). Full record: docs/design/INCOMPLETE_INTEGRATION_REMEDIATION.md.
11 changes: 11 additions & 0 deletions .console/log.md
Original file line number Diff line number Diff line change
Expand Up @@ -7017,3 +7017,14 @@ best-effort. Pruned export_metrics_json from d12_baseline (D12 gate confirms 0).
1 test; reviewer suite 113 green (tests/ root + the #322 dedicated CI job);
audit B2-env + doctor + D12 clean. Another false-positive corrected — the
instrumenter wasn't unwired, only its export surface was.

## 2026-06-18 — Remediation campaign COMPLETE: roadmap reframed to completion

Final wrap-up. Updated docs/design/INCOMPLETE_INTEGRATION_REMEDIATION.md to the
completed plan of record: reframed around COMPLETION (operator correction — wire
features, don't delete); recorded all 14 PRs; flipped parse_visibility_scope to
WIRE-done; removed the TeamExecutor RxP false-positive; recorded the 3 adversarial
corrections (cross-repo consumers, indirect dispatch, convention hooks), the
observer-plane completions, the superseded-dup deletes, ContextLifecycle=KEEP,
and the B2 root cause (content-less secret artifact = infra, not a code bug).
Backlog updated. Loop complete.
183 changes: 101 additions & 82 deletions docs/design/INCOMPLETE_INTEGRATION_REMEDIATION.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
---
status: in-progress
status: implemented
---

# Ecosystem Incomplete-Integration Remediation

**Status:** in-progress (autonomous /loop) — plan of record
**Status:** COMPLETE — plan of record + outcomes.
**Origin:** the PR #313 post-mortem widened into a question: how much "built +
tested but never wired" debt exists across the platform, and is the #313 pattern
(a PR claiming completion while shipping inert code) systemic? This doc records
the cross-repo audit, the adversarial dispositions, and the phased remediation.
the cross-repo audit, the adversarial dispositions, and what shipped.
Companion to [Self-Heal Ladder](./SELF_HEAL_LADDER.md).

## Headline finding (adversarial)
Expand All @@ -18,84 +18,103 @@ all 11 src-bearing repos (excluding VideoFoundry + PrivateManifest per the
private-repo deferral) found that what looks "unwired" is overwhelmingly:
- **honestly-declared cross-repo library API** awaiting an external consumer
(SourceRegistry, RepoGraph, CoreRunner runners, OperatorConsole CxRP boundary,
CritiqueExecutor/TeamExecutor RxP runtimes) — **not debt**;
- **framework-dispatched** CLI/route handlers grep can't see — **not debt**;
- **superseded convenience wrappers** — small, benign DELETEs.
CritiqueExecutor RxP runtime) — **not debt**;
- **framework-dispatched / indirect-dispatch** handlers grep can't see — **not debt**;
- **superseded convenience wrappers** — small, benign deletes.

The genuine #313 pattern — a numbered PR explicitly claiming "Complete /
Verified / integrated end-to-end" while the code was never wired — was found in
**exactly one place: OperationsCenter's observer plane** (PRs #247, #279, #250).
Everywhere else, the unwired symbols trace to pre-PR-process **direct pushes**
(no false claim) or are deliberate API surface.

Per-repo genuinely-unwired counts (of public symbols scanned): ContextLifecycle
3/189, CoreRunner 6/21, CritiqueExecutor 0/23, Custodian ~5/250, DAGExecutor
2/30, OperatorConsole 3/96, PlatformManifest 3/40, RepoGraph 1/136,
SourceRegistry 4/97, SwitchBoard 4/125, TeamExecutor 3/34. OperationsCenter
carries the bulk (~111 baselined), concentrated in the observer plane.

## Enforcement backbone (do first — a hollow gate makes everything else theater)

- **B1 — Custodian `--only` silent-skip** *(DONE, Custodian #46)*: a gate naming
a detector the install lacks filtered to an empty set and passed green. Now
refuses unknown ids. The gate is self-verifying.
- **B2 — fleet `.venv` is pinned behind** (`0fa072f`, no D12/DC10). Local/fleet
pre-push gates are no-ops; CI carries the real check via `custodian@main`.
Action: reinstall custodian at the pyproject pin in each repo's venv. Operational,
not a code PR; sequence carefully so running fleet watchers aren't disrupted.
- **B3 — CI pins `@main`, not the pyproject pin** for the audit job. With B1 in
place a vanished detector now errors, but the audit workflow should pin a
known-good Custodian SHA rather than floating `@main`.

## Phases

Dispositions are adversarially adjudicated here — no item is deferred to a human.
WIRE = finish the integration; DELETE = remove symbol + its tests (provably dead,
no external contract); KEEP = legitimate API/dispatch surface, do not touch.

### Phase 1 — enforcement backbone *(DONE)*
- Custodian #46 (B1). B2/B3 tracked above.

### Phase 2 — WIRE genuinely-broken integrations
| repo | item | action |
|------|------|--------|
| TeamExecutor | `TeamExecutorRunner` never registered as an RxP runtime → the whole executor is undispatchable | WIRE: add the RxP runtime/entry-point registration |
| PlatformManifest | `parse_visibility_scope` — `visibility_scope` field shipped in `data/platform_manifest.yaml`, parsed nowhere | WIRE into `load_repo_graph`; surface on the graph |
| SwitchBoard | p95 latency computed but the demote heuristic reads only mean latency (outlier-sensitive) | WIRE p95 into the AdjustmentEngine evaluate path |
| DAGExecutor | `NodeRunner` Protocol defined but the runner dispatch isn't typed to it (inert) | WIRE: type the runner registry as `dict[NodeType, NodeRunner]` |

### Phase 3 — DELETE provably-dead superseded code
| repo | items | rationale |
|------|------|-----------|
| Custodian | `core/runner.py`, `policy/filter.py`, `policy/architecture.py` (+ tests) | orphaned Phase-1/5 scaffold, superseded by `cli/runner.py` + `detectors/structure.py`; imported by zero src |
| ContextLifecycle | `select_docs`, `SessionPaths.archived_target` | superseded by `select_docs_split` / `archived_root()` |
| PlatformManifest | `load_default_capabilities`, `parse_also_hosts` | convenience mirror of a used fn; `also_hosts` vestigial for public PM |
| SourceRegistry | `remote_url`, `list_files_changed_between`, `PatchRegistry.all_patches` | over-ported primitives + redundant accessor, 0 callers, 0 tests |
| SwitchBoard | `LaneSelector.plan_routes` | redundant wrapper; prod constructs `DecisionPlanner` directly |
| TeamExecutor | `TeamSession`, `TeamConfig.verifier` | scaffold state never adopted; back-compat shim with no consumer |
| DAGExecutor | `DagGraph.get_node` | test-only accessor; executor walks by `layers()` |
| CoreRunner | `CoreRunnerError` | base exception raised/caught nowhere |

### Phase 4 — OperationsCenter observer plane (the real #313 cluster)
The genuine debt. Per-subsystem adjudication required (wire-vs-delete), because
some may duplicate live collectors:
- `CoverageTrendManager` (#279) — trend/volatility/regression
- `CoverageAlertManager` (#279) — alert generation/routing
- `FlakyTestReporter` (#247) — flaky query/markdown/table
- `MergeDecisionInstrumenter` / `export_metrics_json` (#250) — reviewer metrics
Approach: for each, determine whether a live entrypoint should drive it (WIRE) or
it duplicates existing wiring / is abandoned (DELETE). Then prune the
corresponding names from `audit.d12_baseline` so the gate stops grandfathering
them. This is the largest phase; sequence after the backbone + small wins.

### Phase 5 — ratchet cleanup
As Phases 3-4 land, prune the burned-down names from each repo's
`audit.d12_baseline`, so D12 (once the venvs are current) catches NEW regressions
against a smaller accepted set.

## KEEP list (audited — NOT debt, do not "complete")
SourceRegistry `__all__` engine surface; RepoGraph topology/diff/capabilities
query methods + `RepoGraphDiff`; CoreRunner injectable runners
(`ManualRunner`/`Dispatcher`/HTTP runners/`read_invocation`); OperatorConsole
`cxrp_capture` boundary; CritiqueExecutor (clean); Custodian `Finding` helper
methods; SwitchBoard dormant `metrics.py` (deliberate instrument-now/expose-later).
Verified / integrated end-to-end" while the code was never wired — was confined
to **OperationsCenter's observer plane** (PRs #247, #279, #250). Everywhere
else, the unwired symbols traced to pre-PR-process direct pushes (no false
claim) or were deliberate API surface.

## Guiding principle (operator correction, mid-campaign)

**Complete genuine features — don't delete them.** Deletion is the *last resort*,
used only for provably-superseded duplicates where the feature already exists in
the live path (and called out explicitly when used). The observer-plane features
were therefore **wired in to work as featured**, not removed.

## Three adversarial corrections (the audit's blind spots)

The per-repo audit ran each repo in isolation and over-flagged. Three findings
the corrections caught before acting:
1. **Cross-repo consumers.** `TeamExecutorRunner` was flagged "undispatchable"
but is **already wired** by OC's `backends/team_executor/adapter.py`. Dropped.
2. **Indirect dispatch.** `MergeDecisionInstrumenter` was flagged "never
instantiated" but is **already wired** via the module-level
`record_decision_outcome` (→ `get_instrumenter()`) the reviewer calls. The
real gap was only the unused `export_metrics_json` surface.
3. **Convention hooks flip dispositions.** PlatformManifest `parse_visibility_scope`
was flagged DELETE, but a PlatformManifest loader convention (surfaced as a
PreToolUse hook) mandates the loader fail-closed on visibility — so it was a
**WIRE**, not a delete.

## Outcomes (14 green-gated PRs)

### Enforcement backbone
- **Custodian #46** — `--only` silent-skip closed: a gate naming an absent
detector now fails loudly instead of green-passing. The gate is self-verifying.

### WIRE — genuine integrations completed
- **DAGExecutor #10** — `NodeRunner` Protocol now enforced on the runner dispatch.
- **SwitchBoard #21** — demote heuristic gates on **p95** tail latency (was mean),
catching tail degradation the mean hid. (Also deleted `plan_routes` + `p50`.)
- **PlatformManifest #83** — `load_repo_graph` now **fail-closes on
`visibility_scope`** (the documented security boundary), via the previously
unwired `parse_visibility_scope`. (Also deleted `load_default_capabilities`.)
- **OperationsCenter #325** — `FlakyTestReporter` wired into the live pytest
plugin: each session persists results in its format + writes a markdown report.
- **OperationsCenter #326** — `CoverageTrendManager` + `CoverageAlertManager`
wired into `RepoObserverService`: bridge `CoverageSignal → CoverageSnapshot`,
then trend + regression + alerts compute and persist on every observation.
- **OperationsCenter #327** — merge-decision metrics surfaced: the reviewer now
exports `MergeDecisionInstrumenter` metrics each cycle (the instrumenter was
already collecting; only `export_metrics_json` had no reader).

### DELETE — provably-superseded duplicates (nothing to complete)
- **Custodian #47** — orphan Phase-1/5 scaffold (`core/runner.py`,
`policy/{filter,architecture}.py`), superseded by the live `cli/runner.py` path.
- **DAGExecutor #11** — `DagGraph.get_node` (test-only accessor).
- **SourceRegistry #14** — over-ported git-ops primitives + a redundant patch
accessor (0 callers, 0 tests).
- **TeamExecutor #12** — scaffold-era `TeamSession` + `TeamConfig.verifier` shim.
- **CoreRunner #20** — `CoreRunnerError` (raised/caught nowhere, not exported).

### KEEP — audited, not debt
- **ContextLifecycle** (`select_docs`, `SessionPaths.archived_target`): clean,
*documented* convenience accessors in a high-risk path-dispatched engine;
`select_docs` would need a 9-test rewire and `archived_target` cascades — low
value, real risk. Left intentionally.
- The honestly-deferred cross-repo API surfaces (SourceRegistry `__all__`,
RepoGraph query/diff methods, CoreRunner injectable runners, OperatorConsole
`cxrp_capture`), CritiqueExecutor (clean), Custodian `Finding` helpers,
SwitchBoard dormant `metrics.py`.

### Ratchet
Each observer wire pruned its now-integrated names from OC
`audit.d12_baseline` (`format_flaky_tests_markdown`, `save_test_results`,
`detect_regression`, `generate_alerts`, `save_snapshot`, `save_alert`,
`export_metrics_json`); the **D12 gate confirms 0** after each — proof the wires
are real, not baseline-hidden. The still-unwired public methods
(`calculate_trend_slope`, `calculate_volatility_score`, `get_historical_data`,
`categorize_alert`, `get_routes_for_alert`) remain baselined.

## Backbone notes (infra / follow-ups, not code-fixed here)

- **B2 boundary-artifact (root cause).** OC's CI `audit` job is red on every PR
with a single MED **B2** finding *even though* `REPOGRAPH_BOUNDARY_ARTIFACT_FILE`
is materialized in CI. Diagnosis (read-only): `detect_b2` emits its generic
"not provided" message only when the artifact loads with **no error but yields
zero boundary names** — so the secret-decoded artifact is present and parseable
but **content-less** (no boundary names). This is an **infra/secret issue**
(`REPOGRAPH_BOUNDARY_ARTIFACT_B64` needs a real disclosure artifact), not a
Custodian code bug — hence advisory-only and left for the operator. One genuine
minor Custodian follow-up: B2's message should distinguish *provided-but-no-names*
from *not-provided* (the loader already has the provenance).
- **Audit gate is advisory.** OC `main` is unprotected and the reviewer
LGTM-merges over the advisory (B2-red) `audit` check. Making it required is
blocked on B2 above.
- **Fleet `.venv` pinned behind** (`0fa072f`, no D12/DC10) — local/fleet pre-push
gates are no-ops; CI carries the real check via `custodian@main`. Reinstall at
the pin per repo when convenient (sequence so running watchers aren't disrupted).
Loading