Skip to content

fix(proxy): spec-complete message_start + usage mapping on Claude→OpenAI chat streaming#322

Open
lishuceo wants to merge 2 commits into
SaladDay:mainfrom
lishuceo:bench-fix-message-start-content
Open

fix(proxy): spec-complete message_start + usage mapping on Claude→OpenAI chat streaming#322
lishuceo wants to merge 2 commits into
SaladDay:mainfrom
lishuceo:bench-fix-message-start-content

Conversation

@lishuceo

@lishuceo lishuceo commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Two related fixes to the Claude /v1/messages → OpenAI chat-completions streaming path (api_format: "openai_chat"), the route used when Claude Code / an Anthropic SDK client is pointed at an OpenAI-compatible upstream (OpenAI, LiteLLM, DeepSeek, kimi, MiniMax, a local vLLM, etc.).

Both problems were already solved on the Codex chat-bridge path (transform_codex_chat.rs / streaming_codex_chat.rs) — including a comment there describing the exact usage failure — but the same handling was never applied to the plain openai_chat path. These commits port that existing, accepted approach over.

Fix 1 — spec-complete message_start (50059c4)

message_start was emitted without content / stop_reason / stop_sequence. The official Anthropic SDK stream accumulator does snapshot.content.push(...) and throws "Cannot read properties of undefined" when content is missing, forcing clients into non-streaming fallback (or silent stream loss). Now the message snapshot carries content: [], stop_reason: null, stop_sequence: null.

Fix 2 — streaming usage mapped correctly (5f3d6a6)

Streaming responses on this path reported input_tokens / output_tokens / cache tokens as 0, so any usage/cost/quota accounting downstream was blank. Two gaps, both mirroring the Codex path:

  • Request side (transform.rs) — the Anthropic→OpenAI translation copied stream but never injected stream_options.include_usage. OpenAI-compatible upstreams don't emit a usage chunk while streaming unless it's requested, so usage never came back at all.
  • Response side (streaming.rs) — the final usage arrives in a trailing choices: [] chunk after the finish_reason chunk, but the choices.first() guard dropped that chunk, so message_start / message_delta stayed at {input_tokens: 0, output_tokens: 0}. Now usage is cached from every chunk before the guard (as streaming_codex_chat.rs already does with latest_usage), and a single spec-correct message_delta carrying the final cumulative usage is flushed at [DONE] / stream end.

This matches the canonical Anthropic wire behavior: message_start.usage seeds the counts and the final (cumulative) message_delta.usage carries the authoritative totals, which the SDK accumulator merges.

Verification

  • New unit tests: anthropic_to_openai_stream_injects_include_usage, anthropic_to_openai_non_stream_omits_stream_options, streaming_usage_from_trailing_empty_choices_chunk_is_mapped (reproduces the trailing choices: [] + usage shape).
  • Existing streaming unit + integration suites (proxy_claude_streaming, proxy_claude_openai_chat, proxy_claude_response_parity, proxy_claude_forwarder_alignment) pass unchanged.
  • Verified end-to-end against an OpenAI-compatible gateway: streaming requests that previously reported all-zero usage now report correct input_tokens / output_tokens / cache tokens, and downstream cost accounting is populated.

Notes

  • No behavior change for the Codex/Responses path or native Anthropic passthrough.
  • The deferred message_delta is emitted once, just before message_stop, and preserves the existing stop_reason mapping.

🤖 Generated with Claude Code

lishuceo and others added 2 commits June 30, 2026 22:06
…p_sequence)

The Anthropic streaming spec requires the message_start snapshot to carry
"content": [], "stop_reason": null, "stop_sequence": null. Omitting
"content" crashes the official @anthropic-ai/sdk stream accumulator
(snapshot.content.push -> 'Cannot read properties of undefined') on every
translated streamed response, forcing clients into per-turn non-streaming
fallback (double billing) and intermittently surfacing as a silent empty
end-of-turn that terminates agent sessions mid-task.

Fixed in all 6 message_start construction sites: openai_chat (streaming.rs),
openai_responses (streaming_responses.rs x3, incl. the tool-call fallback
path), gemini (streaming_gemini.rs x2). Adds a regression test per transform
path asserting message_start carries the spec-required fields.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Token/cost/cache accounting was zeroed for streaming requests routed
through api_format="openai_chat" (e.g. gpt-5.5 via a LiteLLM gateway).
Two gaps, both mirroring handling the codex chat bridge already had:

- Request side (transform.rs): the Anthropic→OpenAI chat translation
  copied `stream` but never injected `stream_options.include_usage`, so
  OpenAI-compatible upstreams never emitted a usage chunk while streaming.

- Response side (streaming.rs): the final usage arrives in a trailing
  `choices:[]` chunk *after* the finish_reason chunk, but the
  `choices.first()` guard dropped it, leaving message_start/message_delta
  at {input_tokens:0, output_tokens:0}. Cache usage from every chunk
  before the guard and flush a single deferred message_delta (carrying
  the final usage) at [DONE]/stream end.

Adds unit tests for both sides.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant