fix(proxy): spec-complete message_start + usage mapping on Claude→OpenAI chat streaming#322
Open
lishuceo wants to merge 2 commits into
Open
fix(proxy): spec-complete message_start + usage mapping on Claude→OpenAI chat streaming#322lishuceo wants to merge 2 commits into
lishuceo wants to merge 2 commits into
Conversation
…p_sequence) The Anthropic streaming spec requires the message_start snapshot to carry "content": [], "stop_reason": null, "stop_sequence": null. Omitting "content" crashes the official @anthropic-ai/sdk stream accumulator (snapshot.content.push -> 'Cannot read properties of undefined') on every translated streamed response, forcing clients into per-turn non-streaming fallback (double billing) and intermittently surfacing as a silent empty end-of-turn that terminates agent sessions mid-task. Fixed in all 6 message_start construction sites: openai_chat (streaming.rs), openai_responses (streaming_responses.rs x3, incl. the tool-call fallback path), gemini (streaming_gemini.rs x2). Adds a regression test per transform path asserting message_start carries the spec-required fields. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Token/cost/cache accounting was zeroed for streaming requests routed
through api_format="openai_chat" (e.g. gpt-5.5 via a LiteLLM gateway).
Two gaps, both mirroring handling the codex chat bridge already had:
- Request side (transform.rs): the Anthropic→OpenAI chat translation
copied `stream` but never injected `stream_options.include_usage`, so
OpenAI-compatible upstreams never emitted a usage chunk while streaming.
- Response side (streaming.rs): the final usage arrives in a trailing
`choices:[]` chunk *after* the finish_reason chunk, but the
`choices.first()` guard dropped it, leaving message_start/message_delta
at {input_tokens:0, output_tokens:0}. Cache usage from every chunk
before the guard and flush a single deferred message_delta (carrying
the final usage) at [DONE]/stream end.
Adds unit tests for both sides.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related fixes to the Claude
/v1/messages→ OpenAI chat-completions streaming path (api_format: "openai_chat"), the route used when Claude Code / an Anthropic SDK client is pointed at an OpenAI-compatible upstream (OpenAI, LiteLLM, DeepSeek, kimi, MiniMax, a local vLLM, etc.).Both problems were already solved on the Codex chat-bridge path (
transform_codex_chat.rs/streaming_codex_chat.rs) — including a comment there describing the exact usage failure — but the same handling was never applied to the plainopenai_chatpath. These commits port that existing, accepted approach over.Fix 1 — spec-complete
message_start(50059c4)message_startwas emitted withoutcontent/stop_reason/stop_sequence. The official Anthropic SDK stream accumulator doessnapshot.content.push(...)and throws "Cannot read properties of undefined" whencontentis missing, forcing clients into non-streaming fallback (or silent stream loss). Now the message snapshot carriescontent: [],stop_reason: null,stop_sequence: null.Fix 2 — streaming usage mapped correctly (
5f3d6a6)Streaming responses on this path reported
input_tokens/output_tokens/ cache tokens as 0, so any usage/cost/quota accounting downstream was blank. Two gaps, both mirroring the Codex path:transform.rs) — the Anthropic→OpenAI translation copiedstreambut never injectedstream_options.include_usage. OpenAI-compatible upstreams don't emit a usage chunk while streaming unless it's requested, so usage never came back at all.streaming.rs) — the final usage arrives in a trailingchoices: []chunk after thefinish_reasonchunk, but thechoices.first()guard dropped that chunk, somessage_start/message_deltastayed at{input_tokens: 0, output_tokens: 0}. Now usage is cached from every chunk before the guard (asstreaming_codex_chat.rsalready does withlatest_usage), and a single spec-correctmessage_deltacarrying the final cumulative usage is flushed at[DONE]/ stream end.This matches the canonical Anthropic wire behavior:
message_start.usageseeds the counts and the final (cumulative)message_delta.usagecarries the authoritative totals, which the SDK accumulator merges.Verification
anthropic_to_openai_stream_injects_include_usage,anthropic_to_openai_non_stream_omits_stream_options,streaming_usage_from_trailing_empty_choices_chunk_is_mapped(reproduces the trailingchoices: []+ usage shape).proxy_claude_streaming,proxy_claude_openai_chat,proxy_claude_response_parity,proxy_claude_forwarder_alignment) pass unchanged.input_tokens/output_tokens/ cache tokens, and downstream cost accounting is populated.Notes
message_deltais emitted once, just beforemessage_stop, and preserves the existingstop_reasonmapping.🤖 Generated with Claude Code