fix(proxy): spec-complete message_start + usage mapping on Claude→OpenAI chat streaming by lishuceo · Pull Request #322 · SaladDay/cc-switch-cli

lishuceo · 2026-07-02T10:33:18Z

Summary

Two related fixes to the Claude /v1/messages → OpenAI chat-completions streaming path (api_format: "openai_chat"), the route used when Claude Code / an Anthropic SDK client is pointed at an OpenAI-compatible upstream (OpenAI, LiteLLM, DeepSeek, kimi, MiniMax, a local vLLM, etc.).

Both problems were already solved on the Codex chat-bridge path (transform_codex_chat.rs / streaming_codex_chat.rs) — including a comment there describing the exact usage failure — but the same handling was never applied to the plain openai_chat path. These commits port that existing, accepted approach over.

Fix 1 — spec-complete `message_start` (`50059c4`)

message_start was emitted without content / stop_reason / stop_sequence. The official Anthropic SDK stream accumulator does snapshot.content.push(...) and throws "Cannot read properties of undefined" when content is missing, forcing clients into non-streaming fallback (or silent stream loss). Now the message snapshot carries content: [], stop_reason: null, stop_sequence: null.

Fix 2 — streaming usage mapped correctly (`5f3d6a6`)

Streaming responses on this path reported input_tokens / output_tokens / cache tokens as 0, so any usage/cost/quota accounting downstream was blank. Two gaps, both mirroring the Codex path:

Request side (transform.rs) — the Anthropic→OpenAI translation copied stream but never injected stream_options.include_usage. OpenAI-compatible upstreams don't emit a usage chunk while streaming unless it's requested, so usage never came back at all.
Response side (streaming.rs) — the final usage arrives in a trailing choices: [] chunk after the finish_reason chunk, but the choices.first() guard dropped that chunk, so message_start / message_delta stayed at {input_tokens: 0, output_tokens: 0}. Now usage is cached from every chunk before the guard (as streaming_codex_chat.rs already does with latest_usage), and a single spec-correct message_delta carrying the final cumulative usage is flushed at [DONE] / stream end.

This matches the canonical Anthropic wire behavior: message_start.usage seeds the counts and the final (cumulative) message_delta.usage carries the authoritative totals, which the SDK accumulator merges.

Verification

New unit tests: anthropic_to_openai_stream_injects_include_usage, anthropic_to_openai_non_stream_omits_stream_options, streaming_usage_from_trailing_empty_choices_chunk_is_mapped (reproduces the trailing choices: [] + usage shape).
Existing streaming unit + integration suites (proxy_claude_streaming, proxy_claude_openai_chat, proxy_claude_response_parity, proxy_claude_forwarder_alignment) pass unchanged.
Verified end-to-end against an OpenAI-compatible gateway: streaming requests that previously reported all-zero usage now report correct input_tokens / output_tokens / cache tokens, and downstream cost accounting is populated.

Notes

No behavior change for the Codex/Responses path or native Anthropic passthrough.
The deferred message_delta is emitted once, just before message_stop, and preserves the existing stop_reason mapping.

🤖 Generated with Claude Code

…p_sequence) The Anthropic streaming spec requires the message_start snapshot to carry "content": [], "stop_reason": null, "stop_sequence": null. Omitting "content" crashes the official @anthropic-ai/sdk stream accumulator (snapshot.content.push -> 'Cannot read properties of undefined') on every translated streamed response, forcing clients into per-turn non-streaming fallback (double billing) and intermittently surfacing as a silent empty end-of-turn that terminates agent sessions mid-task. Fixed in all 6 message_start construction sites: openai_chat (streaming.rs), openai_responses (streaming_responses.rs x3, incl. the tool-call fallback path), gemini (streaming_gemini.rs x2). Adds a regression test per transform path asserting message_start carries the spec-required fields. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Token/cost/cache accounting was zeroed for streaming requests routed through api_format="openai_chat" (e.g. gpt-5.5 via a LiteLLM gateway). Two gaps, both mirroring handling the codex chat bridge already had: - Request side (transform.rs): the Anthropic→OpenAI chat translation copied `stream` but never injected `stream_options.include_usage`, so OpenAI-compatible upstreams never emitted a usage chunk while streaming. - Response side (streaming.rs): the final usage arrives in a trailing `choices:[]` chunk *after* the finish_reason chunk, but the `choices.first()` guard dropped it, leaving message_start/message_delta at {input_tokens:0, output_tokens:0}. Cache usage from every chunk before the guard and flush a single deferred message_delta (carrying the final usage) at [DONE]/stream end. Adds unit tests for both sides. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lishuceo and others added 2 commits June 30, 2026 22:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(proxy): spec-complete message_start + usage mapping on Claude→OpenAI chat streaming#322

fix(proxy): spec-complete message_start + usage mapping on Claude→OpenAI chat streaming#322
lishuceo wants to merge 2 commits into
SaladDay:mainfrom
lishuceo:bench-fix-message-start-content

lishuceo commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lishuceo commented Jul 2, 2026

Summary

Fix 1 — spec-complete message_start (50059c4)

Fix 2 — streaming usage mapped correctly (5f3d6a6)

Verification

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix 1 — spec-complete `message_start` (`50059c4`)

Fix 2 — streaming usage mapped correctly (`5f3d6a6`)