feat: classify Claude CLI errors into actionable user messages#963
Merged
gtrrz-victor merged 19 commits intomainfrom Apr 17, 2026
Merged
feat: classify Claude CLI errors into actionable user messages#963gtrrz-victor merged 19 commits intomainfrom
gtrrz-victor merged 19 commits intomainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Enhances entire explain --generate summary generation by switching to stream-json Claude CLI output for observable progress, adding typed Claude CLI error classification for actionable user messaging, and introducing a TTY-aware deadline strategy (no deadline on interactive TTY by default; safety deadline in non-TTY/CI with optional settings override).
Changes:
- Add streaming progress plumbing (
agent.StreamingTextGenerator, progress events, and progress rendering inexplain). - Introduce typed Claude CLI error classification (
*claudecode.ClaudeError) and improved user-facing error formatting. - Add
summary_timeout_secondssetting with helper API and tests.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| cmd/entire/cli/explain.go | Adds progress output, deadline resolution, idle watchdog, and error formatting for explain --generate. |
| cmd/entire/cli/explain_test.go | Updates/extends tests for deadline behavior, progress writer, watchdog, and error formatting. |
| cmd/entire/cli/summarize/summarize.go | Threads optional progress callback into summary generation and preserves typed errors. |
| cmd/entire/cli/summarize/summarize_test.go | Updates call sites and adds tests ensuring progress callback wiring. |
| cmd/entire/cli/summarize/claude.go | Prefers streaming generation when available and preserves typed errors. |
| cmd/entire/cli/summarize/claude_test.go | Adds tests for streaming preference and progress emission expectations. |
| cmd/entire/cli/strategy/manual_commit_condensation.go | Updates summary generation call signature. |
| cmd/entire/cli/settings/settings.go | Adds summary_timeout_seconds and SummaryTimeoutValue(). |
| cmd/entire/cli/settings/settings_test.go | Adds tests for SummaryTimeoutValue() behavior. |
| cmd/entire/cli/agent/agent.go | Introduces progress phases/types and StreamingTextGenerator interface. |
| cmd/entire/cli/agent/capabilities.go | Adds declared capability + helper for StreamingTextGenerator. |
| cmd/entire/cli/agent/capabilities_test.go | Adds coverage for AsStreamingTextGenerator. |
| cmd/entire/cli/agent/claudecode/generate.go | Switches to --output-format stream-json, parses streaming output, emits progress, returns typed errors. |
| cmd/entire/cli/agent/claudecode/response.go | Adds NDJSON stream parser (streamClaudeResponse). |
| cmd/entire/cli/agent/claudecode/response_test.go | Adds fixture-based and edge-case tests for streaming parser. |
| cmd/entire/cli/agent/claudecode/error.go | Defines ClaudeError and envelope/stderr classifiers. |
| cmd/entire/cli/agent/claudecode/error_test.go | Adds unit tests for typed errors and classifiers. |
| cmd/entire/cli/agent/claudecode/claude_test.go | Adds tests for streaming phases, typed errors, and delegation behavior. |
| cmd/entire/cli/agent/claudecode/testdata/*.jsonl | Adds stream-json fixtures for success and is_error envelope. |
6bdd3f2 to
2965e24
Compare
…ue helper Optional override for `entire explain --generate` summary deadline. Treats unset / 0 / negative as "caller picks default" so the explain layer can distinguish "user explicitly chose a cap" from "use mode-based default". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 4dc60abd40b5
Introduces a typed error so callers (specifically cli/explain.go) can map structured Claude CLI failures to user-facing messages via errors.As. No classifier logic yet; envelope and stderr classifiers follow in Task 4. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 6901d9051456
classifyEnvelopeError maps api_error_status (401/403 -> auth, 429 -> rate_limit, other 4xx -> config, else unknown) for the structured exit-0 failure path observed in the live CLI. classifyStderrError is a small fallback for crashes that exit non-zero before producing an envelope; matches a deliberately short auth-phrase list. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 6ebe1c92ce0b
2965e24 to
cd538a4
Compare
Contributor
Author
Contributor
Author
|
@BugBot review |
4 tasks
Extends the response envelope to capture is_error/api_error_status,
makes GenerateText return typed *ClaudeError (auth/rate_limit/config/
cli_missing/unknown), drops the duplicate error wraps in summarize,
and adds formatCheckpointSummaryError in explain.go to map each kind
to a user-facing message with specific remediation steps.
Fixes the double-prefix issue ("failed to generate summary: failed to
generate summary: ...") and makes exit-0 operational errors (invalid
model, auth failure, rate limit) actionable instead of being
misreported as JSON parse failures.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 0218a86edc58
cd538a4 to
92449c4
Compare
…e fallback When the subprocess exits non-zero with no stderr (segfault, OOM, SIGKILL), the prior fallback returned just "claude CLI error (kind=unknown)" with no diagnostic context. Now includes the exit code when ExitCode is non-zero, producing "claude CLI error (kind=unknown, exit=137)" so operators can distinguish between OOM (137), SIGKILL (137/9), assertion failures, etc. Existing zero-ExitCode + empty-Message case is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: b4e206369330
Claude CLI emits {is_error:true, result:null, api_error_status:5xx} on
internal failures. The previous parser gated on response.Result != nil,
so these envelopes fell through to the array decode path which failed,
returning a generic "missing result item" error and discarding the
IsError / APIErrorStatus signals.
Now: when is_error is true on the single-object path, return the
envelope even if Result is absent. Upstream classifyEnvelopeError then
produces a typed ClaudeError carrying the API status instead of losing
it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: f52979b5e5bc
…i_error_status is absent Some Claude CLI failure modes emit is_error:true with no api_error_status (older builds, internal/assertion failures). Previously these classified as ClaudeErrorUnknown even when the result text contained a clear auth signal like "Invalid API key", so users lost the auth-specific "run claude login" message. Extract the phrase scan into hasAuthPhrase and call it from both paths: - envelope path when APIStatus == 0 - stderr fallback path (unchanged behavior, shared implementation) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 0a18a726a89a
…enerateFromTranscript
The //nolint:wrapcheck annotations in claude.go and summarize.go exist
specifically to prevent wrapping that would break errors.As at the explain
layer. Nothing was guarding that contract — a refactor that replaced
`return err` with `fmt.Errorf("...: %v", err)` (note %v, not %w) would
silently flatten the typed error and regress every actionable user message.
Both new tests inject a *ClaudeError via a stub, wrap the result in an outer
fmt.Errorf("...: %w", ...) to simulate hand-off through additional layers,
and assert errors.As still recovers the typed Kind and APIStatus.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: ede5fafb3e0d
formatCheckpointSummaryError told users to add summary_timeout_seconds to settings.local.json on timeout, but generateCheckpointAISummary still hard-codes a 30s deadline and never reads SummaryTimeoutValue(). Users following the advice would see no behavioral change — a misdirection worse than no advice at all. Also trim the aspirational "no deadline interactive, 5 min in CI" from the setting's doc, which describes follow-up wiring that doesn't exist yet in this PR. The setting field and accessor remain (this PR's scope was to add the typed-error classification; wiring ships in the follow-up that already exists as a stacked PR). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: baeff00847fe
- formatCheckpointSummaryError no longer claims to be "the single point where the final CLI wording is decided." That invariant isn't enforced, and the stacked follow-up PR already adds a second caller site for the typed-error mapping. - ClaudeError field comments: keep the "user-safe" note on Message (has genuine WHY value for downstream formatting), collapse the redundant per-field "0 if not applicable" lines into one type-level sentence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: fd04416bfa6b
…Message is empty
User-perspective simulation of each failure mode uncovered two scenarios
where the user-facing output ended with a bare colon and nothing else:
"Claude failed to generate the summary: "
This happened because formatCheckpointSummaryError's default branch
reads claudeErr.Message directly, and two real failure paths can set
Message to "":
- envelope with is_error:true and result:null (now preserved by the
earlier null-result fix; previously the ugly parse error leaked
through instead)
- subprocess crashes with no stderr output (OOM / SIGKILL)
Extract formatClaudeErrorSuffix helper that prefers Message, falls
back to APIStatus ("Anthropic API returned HTTP 500"), then ExitCode
("claude CLI exited with code 137"), then empty string.
After this change:
null-result-500 ->
"Claude failed to generate the summary (Anthropic API returned HTTP 500)"
oom-137 ->
"Claude failed to generate the summary (claude CLI exited with code 137)"
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: c9216a8d86a7
…observability # Conflicts: # cmd/entire/cli/agent/claudecode/generate.go # cmd/entire/cli/settings/settings.go # cmd/entire/cli/summarize/summarize_test.go
Four consolidations based on a test audit: 1. Drop TestClaudeError_ErrorEmptyMessageNegativeExitCode — same ExitCode != 0 code path as the 137 case; language guarantees the negative path. 2. Drop the AuthFromResultCaseInsensitive table row — case-insensitivity is already exercised by the stderr path (AuthCaseInsensitive) and both paths share hasAuthPhrase. 3. Consolidate the two preservation tests into one end-to-end test through the real production chain: stubTextGenerator returns *ClaudeError, ClaudeGenerator.Generate must preserve it, then GenerateFromTranscript must preserve it. A single test now pins both //nolint:wrapcheck contracts through the real pipe rather than via two separate stub setups. 4. Collapse the three TestFormatCheckpointSummaryError_UnknownEmptyMessage* tests into one table-driven test with three rows (APIStatus / ExitCode / Message). Same coverage, idiomatic Go. Net: -81 lines in the PR diff, same behavioral coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 27555b2a8e4a
… response shape
The object-path fix landed earlier but the array path still required
Result != nil, so an event-array response like
[..., {"type":"result","is_error":true,"api_error_status":401,"result":null}]
fell through to a generic "missing result item" error and discarded the
structured auth/rate-limit/config signal — exactly the failure mode this
PR is trying to improve.
Array path now mirrors the object path: when a result element has
is_error:true, return the envelope regardless of Result presence.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 27dfc83479f4
peyton-alt
added a commit
that referenced
this pull request
Apr 17, 2026
…tream-progress Resolve conflicts from the typed-error classification work (PR #963) landing on the base branch while this PR kept streaming-mode changes: - response.go: keep streaming NDJSON parser + base's parseGenerateTextResponse fix for is_error:true with null result - error.go/error_test.go: take base's ExitCode fallback, hasAuthPhrase helper, and envelope auth-phrase heuristic - explain.go: keep wired summary_timeout_seconds message, incorporate base's formatClaudeErrorSuffix fallback for empty Message - settings.go: drop "not yet consumed" comment (now wired by this PR) - claude_test.go/summarize: keep streaming tests, take base's regression test for *ClaudeError preservation through Generate/GenerateFromTranscript - summarize_test.go: add nil progress arg to PreservesClaudeError call site Entire-Checkpoint: e7274f4d26d3
generateCheckpointAISummary previously checked timeoutCtx.Err() in an OR with errors.Is(err, ...). If the summarizer returned a real typed error (e.g. *ClaudeError for auth / rate-limit / invalid-model) AND the context had happened to hit its deadline or been canceled independently, the typed error was silently replaced with a bare context-sentinel wrap, routing the user to the wrong "safety deadline" guidance instead of "run claude login". Only classify as ctx-derived when the error chain actually contains the sentinel. GenerateText already returns the bare sentinel directly when ctx triggered the subprocess kill, so this classification still fires for the real timeout/cancel cases. Also wrap with %w err (not %w context.Canceled) so the actual cause survives errors.As. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 85e3e7b95a2a
Two user-visible gaps in formatClaudeErrorSuffix:
1. When the subprocess failed before producing an exit code (launch
failure, signal before exec), generate.go set ExitCode = -1 as a
sentinel. formatClaudeErrorSuffix rendered that literally:
"... (claude CLI exited with code -1)"
A -1 exit code is never a real value. Render it as abnormal
termination instead.
2. A ClaudeError with all-zero fields (reachable via
classifyEnvelopeError("", nil, 0) when envelope is is_error:true with
result:null and no api_error_status) produced empty suffix output:
"Claude failed to generate the summary"
No diagnostic, no remediation. Now renders a clear sentinel
"(no diagnostic detail available from Claude CLI)" so the user at
least knows we had no signal to work with.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 53c7be23d264
…sertions Two related fixes: 1. formatCheckpointSummaryError's DeadlineExceeded branch hardcoded "Claude" / "sonnet" / "Anthropic" — but `entire explain --generate` now supports multiple summary providers (claude-code, codex, gemini, ...) after PR #887. Users who selected a non-Claude provider and hit the safety deadline were being told to debug the wrong CLI and service. Rewrite the message to reference "the summary provider" neutrally so it's accurate for any selection. 2. TestFormatCheckpointSummaryError_DeadlineExceeded only asserted positive substrings ("5m", "safety deadline", "status.anthropic.com"). A regression that re-introduced the removed "summary_timeout_seconds" advice would still pass. Add negative assertions guarding against that re-introduction and against reintroducing any Claude/Anthropic/sonnet wording. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 89244efd069a
All 4 items are real user-visible bugs flagged by multiple reviewers:
A. Bare colon in typed Auth/RateLimit/Config branches
formatCheckpointSummaryError fixed the default branch in round 2
(formatClaudeErrorSuffix) but missed the typed branches, which still
did unconditional ": %s" interpolation. When Claude emits is_error
with a real HTTP status but result:null (401/429/4xx), users saw:
"Claude authentication failed: \nRun `claude login` and retry"
Route all three typed branches through a new formatMessageSuffix
helper that drops the colon when Message is empty.
B. Wrong timeout value in user-facing error
generateCheckpointSummary unconditionally passed checkpointSummaryTimeout
(the 30s default) to formatCheckpointSummaryError, even when the parent
context had already set a shorter effective deadline. Users hitting a
10s upstream timeout were told "did not return within 30s safety deadline"
— misdirecting them at the provider/network when the real cause was
caller-side budget. Thread the actual applied deadline out of
generateCheckpointAISummary and feed it to the formatter.
C. GenerateText ctx-sentinel check dropped typed errors under ctx race
Round 2 fixed this at the explain layer (generateCheckpointAISummary),
but GenerateText itself still checked ctx.Err() FIRST on cmd.Run()
failure. If Claude emitted an is_error envelope AND ctx happened to be
done, the typed ClaudeError was discarded in favor of a bare
context.DeadlineExceeded. Reorder: parse envelope first, fall back to
sentinel classification only when no structured signal is available.
D. isExecNotFound too broad — misclassifies permission errors as
"CLI not installed"
The old check matched any *exec.Error, including permission-denied,
invalid-executable-format, and other launch-time failures. Those
would send operators to `brew install claude-code` when the real
fix is `chmod +x ~/.local/bin/claude`. Tighten to require
errors.Is(execErr.Err, exec.ErrNotFound).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 8ed4da8e9756
Contributor
Author
|
@BugBot review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 0bc148e. Configure here.
peyton-alt
added a commit
that referenced
this pull request
Apr 17, 2026
Switches `entire explain --generate` from silent wait to real-time stream-driven progress. Independent of typed Claude error classification (#963). ### User-visible changes **Interactive (TTY):** live progress phases with in-place updates Generating checkpoint summary... (transcript: 47.0 KB) → Sending request to Anthropic... → Anthropic responded (TTFT 1.9s, 35.9k cached input tokens) -- generating... → Writing summary... (~1.2k tokens) ✓ Summary generated (3.1s, 1.9k output tokens) ✓ Summary generated and saved **Non-TTY (CI):** same lines, one per event (no in-place updates) **ACCESSIBLE=1:** one line per event, no ANSI escape sequences ### Architecture - `agent.StreamingTextGenerator` optional capability interface with `AsStreamingTextGenerator` helper. Plain-error contract — typed error classification will be wired separately (see #963). - Claude Code implements it via `GenerateTextStreaming` in a new file (`generate_streaming.go`), leaving the existing `GenerateText` path untouched. Falls back internally to `--output-format json` when the CLI rejects streaming flags. - `summaryProgressWriter` renders `GenerationProgress` phases using existing `statusStyles` for consistent styling with `entire status`. - TTY-gated deadline: no deadline interactive (Ctrl+C is the timeout); idle-based 5min watchdog non-TTY with a 30min wall-clock cap. - `ProgressFn` threads through `summarize.GenerateFromTranscript` and `ClaudeGenerator`; preserved as nil for the condensation path. ### Scope note This PR intentionally does not map Claude auth/rate-limit/config errors to actionable messages — that's #963's typed-error work. When #963 lands on main, a small follow-up will wire typed errors into the streaming path. Users hitting stream errors in the meantime see a generic "claude CLI reported error: <text>" (HTTP status included when present). ### Tests - Stream parser: success + error fixtures, malformed-line resilience, reader errors - GenerateTextStreaming: phase emission, envelope error surfacing, legacy fallback, context cancellation - Progress writer: all phases, Generating dedup, lastActivity tracking - Idle watchdog: stale-fire, bump-prevents-fire, zero-idle noop - Deadline resolver: interactive no-deadline vs non-interactive idle Entire-Checkpoint: 56ee3789dcea
gtrrz-victor
approved these changes
Apr 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Makes
entire explain --generatesurface actionable error messages when theClaude CLI fails, instead of generic "failed to parse summary JSON" output
or silent empty errors.
Problem
The Claude CLI returns most operational errors (auth, rate limit, invalid
model) at exit 0 with
is_error: truein its JSON envelope. The priorimplementation didn't check
is_error, tried to JSON-parse the error textas a
Summary, and exposed users to messages like:failed to parse summary JSON: ... (response: There's an issue with the selected model...)failed to generate summary: failed to generate summary: claude CLI failed (exit -1):(double-wrapped)For CLI crashes with no output (OOM, SIGKILL) users saw literally
Claude failed to generate the summary:— nothing after the colon.What changed
Typed error classification. Introduces
*claudecode.ClaudeErrorwith aKinddiscriminator (auth/rate_limit/config/cli_missing/unknown) populated from HTTP status or stderr fallback:api_error_status)is_error:true+result:null→ preserved with its HTTP status instead of discarded as a parse errorcli_missingerrors.Asall the way fromclaudecode.GenerateText→summarize.Generate→ explainUser-facing message mapping.
formatCheckpointSummaryErrorturns each Kind into an actionable message with remediation:Claude authentication failed: <detail>+Run `claude login` and retryClaude rejected the summary request due to rate limits or quota: <detail>+Wait and retryClaude rejected the summary request: <detail>+Check your Claude CLI config and selected modelClaude CLI is not installed or not on PATHSettings groundwork. Adds
summary_timeout_secondsfield andSummaryTimeoutValue()accessor. The field is not yet consumed by the generate path; the stacked follow-up PR wires it into TTY-gated deadline selection.Scope
All behavior changes are contained to:
cmd/entire/cli/agent/claudecode/— typed error + classifier + envelope parser enrichmentcmd/entire/cli/summarize/— drops duplicatefmt.Errorfwrapping soerrors.Assurvives the layer boundary (pinned by regression tests)cmd/entire/cli/explain.go— typed-error mapping for user outputcmd/entire/cli/settings/— new field + accessorMerged with main to resolve conflicts with PR #887 (summary provider) and PR #953 (remote metadata fetch).
GenerateTextintentionally runs the subprocess directly instead of throughagent.RunIsolatedTextGeneratorCLI— the shared helper collapses stderr/exit into a string, but Claude's envelope-based semantics need structured access to all three.Test plan
mise run check(fmt + lint + unit + integration + e2e canary)entire explain -c <id> --generatewith invalid modelentire explain -c <id> --generatewith claude logged outNote
Medium Risk
Changes the subprocess execution and error-handling path for Claude-based summary generation, which could alter behavior across timeout/cancel and CLI failure modes. Well-covered by new unit tests, but still impacts a user-facing critical workflow (
explain --generate).Overview
entire explain --generatenow preserves and classifies Claude CLI failures into a typed*claudecode.ClaudeError(auth/rate-limit/config/CLI-missing/unknown) instead of flattening them into generic parse/exec errors.Claude Code text generation now runs the
claudesubprocess directly to capture stdout/stderr/exit code, parses enriched JSON envelopes (includingis_error:trueandresult:null), and maps structured failures to user-facing remediation messages; summary-generation layers stop re-wrapping errors soerrors.Ascan detect the typed error at the explain layer.Settings add a new
summary_timeout_secondsfield plusSummaryTimeoutValue()accessor (not yet wired into the generate path), and tests are expanded to cover envelope parsing, error classification, timeout/cancel behavior, and non-empty diagnostics.Reviewed by Cursor Bugbot for commit 0bc148e. Configure here.