Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .ddx/beads.jsonl
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,7 @@
{"acceptance":"1. go test ./cmd/bench -run Lane -count=1 passes.\\n2. make gosec passes.\\n3. master CI rerun for the fix commit passes.","closing_commit_sha":"f6bc36b288e435e031695c034fa006e2c2b135ce","created_at":"2026-05-15T17:43:55.074379692Z","description":"PROBLEM\\nBranch CI for pushed master f792fdaa failed in the Security scan step. gosec reported G703 at cmd/bench/lanes.go:755 for os.WriteFile(tmp, data, 0o600) inside writeTextAtomic.\\n\\nROOT CAUSE\\ncmd/bench/lanes.go implemented its own atomic temp write for operator-selected benchmark lane paths instead of using the repo's centralized internal/safefs wrappers where intentional user-selected filesystem writes are documented for gosec.\\n\\nPROPOSED FIX\\nUse safefs.MkdirAll and safefs.WriteFileAtomic in writeTextAtomic, preserving behavior while routing the intentional write through the audited wrapper.\\n\\nNON-SCOPE\\nNo benchmark lane behavior changes and no release workflow changes.","events_attachment":"fizeau-d99b099e/events.jsonl","id":"fizeau-d99b099e","issue_type":"bug","labels":["area:ci","area:benchmark","kind:bug","release-blocker"],"priority":0,"schema_version":1,"session_id":"eb-bbb995c0","status":"closed","title":"ci: route benchmark lane atomic writes through safefs","updated_at":"2026-05-15T17:49:22.174642782Z"}
{"acceptance":"1. TestScorePolicy_MinPowerDoesNotDisableDefaultPolicy asserts that with MinPower=8, the default policy still gives the +15 subscription/free bonus to a free subscription candidate, and that bonus is reflected in candidate.ScoreComponents['quota_health'].\n2. TestScorePolicy_MinPowerSonnetBeatsOpus_DefaultPolicy: given two candidates (sonnet power=8 cost=0 subscription quota_ok; opus power=10 cost=0.045 subscription quota_ok), with MinPower=8, the sonnet candidate's total score is higher than opus.\n3. TestScorePolicy_MaxPowerDoesNotDisableProviderPreference asserts that with MaxPower=8 set, a 'subscription-first' provider preference still adds the +30 quota_health bonus.\n4. Existing service_routing_test.go tests (TestRouteWith* etc.) still pass — regression guard for non-power-bounded scoring.\n5. go test ./internal/routing/... ./... -count=1 passes.","claimed-at":"2026-05-14T23:31:41Z","claimed-machine":"eitri","claimed-pid":"2237932","closing_commit_sha":"8b7de9967773575bf37bed00eeb83d0dcc72a095","created_at":"2026-05-14T23:28:12.475336117Z","description":"PROBLEM\ninternal/routing/score.go:60 and :108 wrap the policy-aware scoring (cheap/default/smart preferences for local/subscription/cost) and the provider-preference bias in `if !hasPowerBounds`. When a caller sets --min-power N or --max-power M, those branches are skipped entirely. Result: among candidates that all satisfy the bounds, the routing falls through to raw base+quota+power-9-bonus scoring, which systematically prefers higher-power-metered models over equivalent-quality subscription-free models.\n\nObserved today: ddx try --harness claude --min-power 8 routed to claude/opus (power 10, score ~76, metered ~$0.045/turn) instead of claude/claude-sonnet-4-6 (power 8, score 186 per fiz route-status without bounds, subscription/free). Across the 11 harness-interface refactor bead executions, this cost real opus tokens for work sonnet would have handled fine.\n\nROOT CAUSE\n- score.go:52: hasPowerBounds := cand.MinPower \u003e 0 || cand.MaxPower \u003e 0\n- score.go:60: `if !hasPowerBounds { switch policy ... }` — the entire policy preference block (deployment locality, quota health, cost-class ranking) is gated off\n- score.go:108: `if !hasPowerBounds { switch cand.ProviderPreference ... }` — provider-preference bias gated off\n\nThe intent was probably: 'when user expresses explicit power bounds, just enforce bounds and ignore default biases.' But power bounds are orthogonal to policy preference — a user asking for power\u003e=8 still wants the cheaper, subscription-backed candidate among the eligible set.\n\nPROPOSED FIX\nRemove the `if !hasPowerBounds` guards at score.go:60 and score.go:108. Apply policy preference and provider-preference bias unconditionally to in-bounds candidates. Keep the below-MinPower / above-MaxPower penalties as they are (lines 227-238) — those correctly demote out-of-bounds candidates without disabling policy scoring for in-bounds ones.\n\nThe natural-power bonus at line 222 (`Power * 12` when no bounds set) can stay gated — that's a different signal: 'when no bounds expressed, treat higher power as a quality proxy.' When bounds ARE expressed, the bounds themselves encode the power preference; double-counting via natural-power-bonus would be wrong.\n\nNON-SCOPE\n- Do not change the below-MinPower / above-MaxPower penalty values (lines 227-238).\n- Do not change the natural-power bonus gating at line 222.\n- Do not change quota-pressure, cooldown, sticky-affinity, or perf scoring.\n- Do not introduce new score components.","events_attachment":"fizeau-dc3cf359/events.jsonl","id":"fizeau-dc3cf359","issue_type":"bug","labels":["area:routing","kind:bug","spec:CONTRACT-003"],"owner":"erik","parent":"fizeau-361f4f4b","priority":0,"schema_version":1,"session_id":"eb-74696b5c","spec-id":"FEAT-004","status":"closed","title":"routing: policy-aware scoring is skipped when MinPower/MaxPower set, preferring expensive opus over free sonnet","updated_at":"2026-05-14T23:50:42.226707412Z","work-heartbeat-at":"2026-05-14T23:31:41.684667418Z"}
{"acceptance":"1. `go test . -run \"TestResolveRoute.*Snapshot|TestResolveRoute.*ModelConstraint|TestResolveRoute.*Provider|TestResolveRoute.*Power|TestResolveRoute.*Correlation\" -count=1` passes with new coverage proving ResolveRoute consumes snapshot rows and preserves hard pins.\n2. `go test ./internal/routing/... -run \"Power|Gating|Sticky|FilterReason\" -count=1` passes.\n3. `rg -n \"liveProviderEntries|probeEndpointDiscoveredIDs|buildRoutingInputsWithCatalog\" service_routing.go` returns no legacy live-discovery routing path, or any remaining match is isolated behind a snapshot assembly helper and not called by ResolveRoute default routing.\n4. A fixture test proves fresh cache data avoids direct provider model discovery during default ResolveRoute.","claimed-at":"2026-05-12T20:34:38Z","claimed-machine":"eitri","claimed-pid":"2683682","closing_commit_sha":"a77cf3e46351dd53e0b5d0ae68d94ba141e576fc","created_at":"2026-05-12T19:40:49.211427501Z","dependencies":[{"issue_id":"fizeau-dc79af6b","depends_on_id":"fizeau-ab20bdb8","type":"blocks","created_at":"2026-05-12T19:42:35Z"}],"description":"CONTEXT\n`ResolveRoute` currently calls `buildRoutingInputsWithCatalog`, which enumerates harnesses/providers and probes provider models through service-specific helpers such as `liveProviderEntries`. SD-005 requires auto-routing to consume the same enriched available-model snapshot exposed by `fiz models`, then expand snapshot rows into dispatch candidates. The routing engine itself should be preserved; the input source should change.\n\nCHANGE\nAdd a snapshot-to-`routing.Inputs` adapter and wire `ResolveRoute` to use it. The adapter must preserve hard pins (`Harness`, `Provider`, exact `Model`), IncludeByDefault/default-deny behavior, auto-routability/exact-pin-only behavior, context/tool/reasoning gates, quota/cooldown signals, endpoint utilization, and sticky server-instance affinity. Unknown snapshot models must remain exact-pinnable but excluded from unpinned automatic routing.\n\nIN-SCOPE FILES\n- service_routing.go\n- internal/routing input tests if adapter logic belongs there\n- service_routing_test.go / service_model_resolution_test.go\n\nOUT-OF-SCOPE\n- Changing score component names.\n- RouteStatus rendering.\n- Root ListModels migration, which should already be complete before this bead starts.","events_attachment":"fizeau-dc79af6b/events.jsonl","execute-loop-heartbeat-at":"2026-05-12T20:34:38.991045562Z","id":"fizeau-dc79af6b","issue_type":"task","labels":["area:routing","area:service","kind:feature","plan-2026-05-12-sd005"],"owner":"erik","parent":"fizeau-e6145528","priority":0,"schema_version":1,"session_id":"eb-9a6c4720","spec-id":"SD-005","status":"closed","title":"routing: feed ResolveRoute from snapshot-derived candidates","updated_at":"2026-05-12T21:10:28.996306048Z"}
{"acceptance":"1. go test -race ./internal/harnesses/codex -run TestParseCodexStream_CommandExecutionDuration -count=20 passes.\n2. make test-race passes locally.\n3. GitHub CI run for the fix commit passes.","created_at":"2026-05-15T17:58:38.567835285Z","description":"PROBLEM\nGitHub CI run 25932885597 for master commit 83583bd6 passed build, vet, lint, security scan, vulnerability scan, format, rename noise, normal go test, website smoke, and adapter pytest, but failed in Test (race). The failing package was internal/harnesses/codex.\n\nOBSERVED FAILURE\nmake test-race runs CGO_ENABLED=1 go test -race -count=1 ./... . internal/harnesses/codex.TestParseCodexStream_CommandExecutionDuration failed with: tool_result duration = 17ms, want \u003e=20ms.\n\nROOT CAUSE HYPOTHESIS\nThe test sleeps for exactly 20ms and asserts DurationMS \u003e= 20. Under race instrumentation and scheduler/timer behavior, the measured wall duration can round/truncate below 20ms even though the parser behavior is correct. This is a test determinism issue, not evidence of a routing or safefs regression.\n\nIN-SCOPE FILES\n- internal/harnesses/codex/runner_test.go\n\nOUT-OF-SCOPE\n- No production parser behavior changes unless required by the test investigation.\n- No CI workflow broadening or check disabling.\n- Do not rerun CI until it happens to pass; make the assertion robust.\n\nEXPECTED FIX\nMake TestParseCodexStream_CommandExecutionDuration deterministic under race instrumentation while still proving command execution durations are captured as positive elapsed time and associated with the emitted tool result.","events":[{"actor":"","body":"[{\"ac\":1,\"text\":\"go test -race ./internal/harnesses/codex -run TestParseCodexStream_CommandExecutionDuration -count=20 passes.\",\"kind\":\"test-name\",\"verifiable\":true},{\"ac\":2,\"text\":\"make test-race passes locally.\",\"kind\":\"prose\",\"verifiable\":false},{\"ac\":3,\"text\":\"GitHub CI run for the fix commit passes.\",\"kind\":\"prose\",\"verifiable\":false}]","created_at":"2026-05-15T17:58:59.560634431Z","kind":"ac-quality-low","source":"preclaim-ac-quality","summary":"score=0.33 threshold=0.50 verifiable=1/3"},{"actor":"erik","body":"{\"decision\":\"warn\",\"decision_source\":\"pre_claim_intake\",\"detail\":\"readiness check timed out after 1s (fizeau-df282732)\",\"fingerprint\":\"3de07a18c763b3c48ab0477e8e46aa75d36519a2ac4f977ef55c0e706e89d78e\",\"policy_mode\":\"warn-only\",\"reason\":\"timeout\",\"rule_id\":\"pre_claim_intake.timeout\",\"suggested_action\":\"revise the rewrite so it preserves every explicit commitment\"}","created_at":"2026-05-15T17:59:00.526996973Z","kind":"intake.warn","source":"ddx work","summary":"timeout"}],"id":"fizeau-df282732","issue_type":"bug","labels":["area:ci","area:harnesses","kind:bug","release-blocker","ac-quality:needs-refinement"],"priority":0,"schema_version":1,"status":"open","title":"ci: make codex stream duration test race-stable","updated_at":"2026-05-15T17:59:00.543199969Z"}
{"acceptance":"1. service.go drops geminiharness import; ReadAuthEvidence call replaced with AccountHarness.AccountStatus\n2. go test ./... passes; structural-diff fixtures pass for gemini rows","claimed-at":"2026-05-15T00:01:38Z","claimed-machine":"eitri","claimed-pid":"2288357","closing_commit_sha":"2c7ec9ec13c93d3f0f1b5c274402e98508513b82","created_at":"2026-05-14T21:01:21.760697018Z","dependencies":[{"issue_id":"fizeau-e15cce96","depends_on_id":"fizeau-c47606b1","type":"blocks","created_at":"2026-05-14T21:01:22Z"},{"issue_id":"fizeau-e15cce96","depends_on_id":"fizeau-279414c7","type":"blocks","created_at":"2026-05-14T21:01:22Z"}],"description":"Source manifest: .ddx/beads-harness-interface-refactor.yaml\nSource bead ID: BEAD-HARNESS-IF-07B\nGoverning plan: docs/helix/02-design/plan-2026-05-14-harness-interface-refactor.md\n\nSee the manifest and the plan for full context, touch-point inventory, and sub-sequence ordering.","events_attachment":"fizeau-e15cce96/events.jsonl","id":"fizeau-e15cce96","issue_type":"task","labels":["area:harness","kind:refactor","spec:CONTRACT-004","harness:gemini"],"owner":"erik","priority":0,"schema_version":1,"session_id":"eb-95bb32d4","source-id":"BEAD-HARNESS-IF-07B","spec-id":"PLAN-HARNESS-INTERFACE-REFACTOR","spec-ref":"docs/helix/02-design/plan-2026-05-14-harness-interface-refactor.md","status":"closed","title":"Migrate service.go gemini consumers","updated_at":"2026-05-15T00:08:56.3028918Z","work-heartbeat-at":"2026-05-15T00:01:38.304936781Z"}
{"acceptance":"1. internal/runtimesignals/ drops claudecache, codexcache, geminicache imports\n2. collect.go consumes harnesses.QuotaHarness via interface assertion through harnessByName\n3. collect_test.go uses synthetic QuotaHarness fixtures from harnesstest\n4. go test ./... passes","claimed-at":"2026-05-15T06:10:17Z","claimed-machine":"eitri","claimed-pid":"4026270","closing_commit_sha":"250f35212dd547a24e218ecc99af9c32bec8e9e7","created_at":"2026-05-14T21:01:21.988271304Z","dependencies":[{"issue_id":"fizeau-e2819b1a","depends_on_id":"fizeau-6286e6cf","type":"blocks","created_at":"2026-05-14T21:01:23Z"},{"issue_id":"fizeau-e2819b1a","depends_on_id":"fizeau-c5e4f20a","type":"blocks","created_at":"2026-05-14T21:01:23Z"},{"issue_id":"fizeau-e2819b1a","depends_on_id":"fizeau-1f53973b","type":"blocks","created_at":"2026-05-14T21:01:23Z"}],"description":"Source manifest: .ddx/beads-harness-interface-refactor.yaml\nSource bead ID: BEAD-HARNESS-IF-09\nGoverning plan: docs/helix/02-design/plan-2026-05-14-harness-interface-refactor.md\n\nSee the manifest and the plan for full context, touch-point inventory, and sub-sequence ordering.","events_attachment":"fizeau-e2819b1a/events.jsonl","id":"fizeau-e2819b1a","issue_type":"task","labels":["area:harness","kind:refactor","spec:CONTRACT-004","area:runtimesignals","ac-quality:needs-refinement"],"owner":"erik","priority":0,"schema_version":1,"session_id":"eb-dae8962c","source-id":"BEAD-HARNESS-IF-09","spec-id":"PLAN-HARNESS-INTERFACE-REFACTOR","spec-ref":"docs/helix/02-design/plan-2026-05-14-harness-interface-refactor.md","status":"closed","title":"Migrate internal/runtimesignals/collect.go to QuotaHarness interface","updated_at":"2026-05-15T06:24:37.214301638Z","work-heartbeat-at":"2026-05-15T06:10:17.41162107Z"}
{"acceptance":"1. go test ./internal/modelregistry/... ./internal/runtimesignals/... ./agentcli/... -run 'Refresh|SingleFlight|Coalesc|Stale|Warm|Heartbeat' -count=1 passes.\\n2. A test proves repeated stale refresh requests create at most one in-flight refresh per provider/field key.\\n3. A test proves refresh failure records refresh_failed or stale freshness state without immediately retrying in a loop.\\n4. A test proves successful refresh atomically advances snapshot version or captured_at metadata.\\n5. A public sync refresh/warmup entrypoint exists for long-running callers, and tests show a caller can invoke it periodically without bypassing locks.\\n6. rg -n 'single.?flight|coalesc|refresh_failed|refresh_all|RefreshAll|RefreshRouting|Warm' internal agentcli service*.go returns production coverage.","claimed-at":"2026-05-13T18:16:15Z","claimed-machine":"eitri","claimed-pid":"2683682","closing_commit_sha":"9817db744f75683d6a43511045cc64592522d39e","created_at":"2026-05-13T17:48:05.254022293Z","dependencies":[{"issue_id":"fizeau-e415fad3","depends_on_id":"fizeau-75710e70","type":"blocks","created_at":"2026-05-13T17:49:17Z"}],"description":"Implement or tighten Fizeau's synchronous refresh coordinator so stale health/quota/model facts are refreshed through one lock-coordinated path instead of each CLI process launching its own probes.\\n\\nDesired behavior:\\n- Fizeau has no required daemon; its contract is synchronous refresh plus cross-process locks, single-flight/coalescing, TTLs, cooldowns, bounded concurrency, and atomic snapshot writes.\\n- Repeated refresh requests for the same provider/field class coalesce into one in-flight operation per TTL/cooldown window.\\n- Refreshes are bounded, concurrent, and per-provider rate/cooldown aware.\\n- Routing-relevant refresh covers health, quota, model availability/discovery, context/tools/reasoning support, billing/effective cost metadata where dynamic, and utilization if available.\\n- Refresh results update the fiz models snapshot atomically with version/timestamp/freshness metadata.\\n- Refresh failure records refresh_failed/stale state for scoring and diagnostics; it must not create an unbounded retry loop.\\n- Long-running callers such as a DDx server can maintain async freshness by invoking this synchronous API from their own heartbeat; route correctness must not depend on that integration.\\n\\nIn scope: cache/refresh coordinator internals, process lock/single-flight logic, public refresh/warmup entrypoint for long-running callers, focused tests using fake refreshers.\\n\\nOut of scope: CLI flag UX; route scoring; changing DDx upstream server behavior if that lives outside this repo.","events_attachment":"fizeau-e415fad3/events.jsonl","execute-loop-heartbeat-at":"2026-05-13T18:16:15.925718952Z","id":"fizeau-e415fad3","issue_type":"task","labels":["area:cache","area:service","kind:feature","plan-2026-05-13-snapshot-routing"],"owner":"erik","parent":"fizeau-4847c99e","priority":0,"schema_version":1,"session_id":"eb-494b37e6","spec-id":"ADR-012","status":"closed","title":"Add refresh coordinator semantics without CLI refresh storms","updated_at":"2026-05-13T18:45:04.716278117Z"}
Expand Down
21 changes: 17 additions & 4 deletions internal/harnesses/codex/runner_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -525,10 +525,13 @@ func TestParseCodexStream_CommandExecutionFailure(t *testing.T) {
}

func TestParseCodexStream_CommandExecutionDuration(t *testing.T) {
const sleepDuration = 50 * time.Millisecond
const minDurationMS = int64(10)

reader, writer := io.Pipe()
go func() {
fmt.Fprintln(writer, `{"type":"item.started","item":{"id":"item_slow","type":"command_execution","command":"sleep 1","status":"in_progress"}}`)
time.Sleep(20 * time.Millisecond)
time.Sleep(sleepDuration)
fmt.Fprintln(writer, `{"type":"item.completed","item":{"id":"item_slow","type":"command_execution","command":"sleep 1","aggregated_output":"done","exit_code":0,"status":"completed"}}`)
writer.Close()
}()
Expand All @@ -540,17 +543,27 @@ func TestParseCodexStream_CommandExecutionDuration(t *testing.T) {
}
close(out)

var result harnesses.ToolResultData
var (
result harnesses.ToolResultData
sawResult bool
)
for ev := range out {
if ev.Type != harnesses.EventTypeToolResult {
continue
}
if err := json.Unmarshal(ev.Data, &result); err != nil {
t.Fatalf("unmarshal tool_result: %v", err)
}
sawResult = true
}
if !sawResult {
t.Fatal("missing tool_result event")
}
if result.ID != "item_slow" {
t.Fatalf("tool_result ID: got %q, want %q", result.ID, "item_slow")
}
if result.DurationMS < 20 {
t.Fatalf("tool_result duration = %dms, want >=20ms", result.DurationMS)
if result.DurationMS < minDurationMS {
t.Fatalf("tool_result duration = %dms, want >=%dms after %s sleep", result.DurationMS, minDurationMS, sleepDuration)
}
}

Expand Down
Loading