Skip to content

fix(llmobs): capture reasoning_content from streamed chat completions#18274

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 6 commits into
mainfrom
fix/llmobs-streamed-reasoning-content
May 27, 2026
Merged

fix(llmobs): capture reasoning_content from streamed chat completions#18274
gh-worker-dd-mergequeue-cf854d[bot] merged 6 commits into
mainfrom
fix/llmobs-streamed-reasoning-content

Conversation

@Yun-Kim
Copy link
Copy Markdown
Contributor

@Yun-Kim Yun-Kim commented May 27, 2026

Summary

Fixes #18257.

The OpenAI + Litellm streamed-chunk aggregator openai_construct_message_from_streamed_chunks never read delta.reasoning_content from streamed chunks, so OpenAI-compatible reasoning providers (DeepSeek-V3/V4, Qwen reasoning models on Baseten/Fireworks, etc.) had their reasoning text silently dropped on the LLM Obs span.

Both the OpenAI and LiteLLM integrations call this aggregator (ddtrace/contrib/internal/openai/utils.py:151, ddtrace/contrib/internal/litellm/utils.py:55) for streamed chat responses and pass the result straight to openai_set_meta_tags_from_chat, which already checks for reasoning_content messages but reasoning content messages were never constructed from the streamed response.

The fix checks for and accumulates delta.reasoning_content message chunks.

Notes for reviewers

  • The OpenAI Python SDK does not declare reasoning_content as a typed field on ChoiceDelta, but its BaseModel is configured with extra="allow" so the field passes through as an attribute when emitted by an OpenAI-compatible provider. LiteLLM's Delta type exposes reasoning_content directly.
  • Avoiding an E2E regression test for now because I don't have a deepseek API key 😢 but the unit tests should be sufficient to get this fix out.

Claude session: 948c6399-4afc-4ad8-acfc-d05db310902d
Resume: claude --resume 948c6399-4afc-4ad8-acfc-d05db310902d

The streamed-chunk aggregator never read delta.reasoning_content, so
OpenAI-compatible reasoning providers (DeepSeek, Qwen, etc.) had their
reasoning text silently dropped on the LLM Obs span while
reasoning_output_tokens was still reported. Add the missing
accumulation; downstream openai_set_meta_tags_from_chat already emits
the role: "reasoning" output message when the key is present.

Fixes #18257

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cit-pr-commenter-54b7da
Copy link
Copy Markdown

cit-pr-commenter-54b7da Bot commented May 27, 2026

Codeowners resolved as

ddtrace/llmobs/_integrations/utils.py                                   @DataDog/ml-observability
releasenotes/notes/fix-llmobs-streamed-reasoning-content-0da3242ccfaa6063.yaml  @DataDog/apm-python
tests/llmobs/test_integrations_utils.py                                 @DataDog/ml-observability

@datadog-official
Copy link
Copy Markdown
Contributor

datadog-official Bot commented May 27, 2026

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 8 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-py | build linux serverless: [amd64, cp315-cp315, v113741238-d2b8243-manylinux2014_x86_64, 1]   View in Datadog   GitLab

🛟 This job is unlikely to succeed on retry. Please review your pipeline configuration. NotImplementedError: This version of CPython is not supported yet

DataDog/apm-reliability/dd-trace-py | build linux serverless: [amd64, cp315-cp315, v113741491-d2b8243-musllinux_1_2_x86_64, 1]   View in Datadog   GitLab

🛟 This job is unlikely to succeed on retry. Please review your pipeline configuration. NotImplementedError: This version of CPython is not supported yet

DataDog/apm-reliability/dd-trace-py | build linux serverless: [arm64, cp315-cp315, v113741357-d2b8243-manylinux2014_aarch64, 1]   View in Datadog   GitLab

🛟 This job is unlikely to succeed on retry. Please review your pipeline configuration. NotImplementedError: This version of CPython is not supported yet during ddtrace import.

View all 8 failed jobs.

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 9b071db | Docs | Datadog PR Page | Give us feedback!

Comment thread releasenotes/notes/fix-llmobs-streamed-reasoning-content-0da3242ccfaa6063.yaml Outdated
Comment thread ddtrace/llmobs/_integrations/utils.py Outdated
Address review feedback: initialize reasoning_content as empty string
and pop if still empty at the end, matching how tool_calls is handled
in the same function.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread ddtrace/llmobs/_integrations/utils.py Outdated
Match the chunk_content pattern on the adjacent line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread tests/llmobs/test_integrations_utils.py Outdated
Comment thread tests/llmobs/test_integrations_utils.py Outdated
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ncybul ncybul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for fixing!

@Yun-Kim Yun-Kim marked this pull request as ready for review May 27, 2026 15:14
@Yun-Kim Yun-Kim requested review from a team as code owners May 27, 2026 15:14
@Yun-Kim Yun-Kim requested review from Kyle-Verhoog and r1viollet May 27, 2026 15:14
@Yun-Kim
Copy link
Copy Markdown
Contributor Author

Yun-Kim commented May 27, 2026

/merge

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 Bot commented May 27, 2026

View all feedbacks in Devflow UI.

2026-05-27 16:46:06 UTC ℹ️ Start processing command /merge


2026-05-27 16:46:12 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in main is approximately 56m (p90).


2026-05-27 17:27:00 UTCMergeQueue: The checks failed on this merge request

Tests failed on this commit b0b5498:

What to do next?

  • Investigate the failures and when ready, re-add your pull request to the queue!
  • If your PR checks are green, try to rebase/merge. It might be because the CI run is a bit old.
  • Any question, go check the FAQ.

@github-actions
Copy link
Copy Markdown
Contributor

This change is marked for backport to 4.9 and it does not conflict with that branch.
The command used to test backporting was

git checkout 4.9 && git cherry-pick -x --mainline 1 853bc88ecb986f29cc2bbdedd5ef92f590da22b8

@github-actions
Copy link
Copy Markdown
Contributor

This change is marked for backport to 4.8 and it does not conflict with that branch.
The command used to test backporting was

git checkout 4.8 && git cherry-pick -x --mainline 1 853bc88ecb986f29cc2bbdedd5ef92f590da22b8

@Yun-Kim
Copy link
Copy Markdown
Contributor Author

Yun-Kim commented May 27, 2026

/merge

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 Bot commented May 27, 2026

View all feedbacks in Devflow UI.

2026-05-27 17:59:46 UTC ℹ️ Start processing command /merge


2026-05-27 18:00:00 UTC ℹ️ MergeQueue: waiting for PR to be ready

This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals. View in MergeQueue UI.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.


2026-05-27 18:12:11 UTC ℹ️ MergeQueue: merge request added to the queue

The expected merge time in main is approximately 56m (p90).


2026-05-27 18:51:30 UTC ℹ️ MergeQueue: This merge request was merged

@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot merged commit b4420fa into main May 27, 2026
611 checks passed
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot deleted the fix/llmobs-streamed-reasoning-content branch May 27, 2026 18:51
github-actions Bot added a commit that referenced this pull request May 27, 2026
…#18274)

## Summary

Fixes #18257.

The OpenAI + Litellm streamed-chunk aggregator `openai_construct_message_from_streamed_chunks` never read `delta.reasoning_content` from streamed chunks, so OpenAI-compatible reasoning providers (DeepSeek-V3/V4, Qwen reasoning models on Baseten/Fireworks, etc.) had their reasoning text silently dropped on the LLM Obs span.

Both the OpenAI and LiteLLM integrations call this aggregator (`ddtrace/contrib/internal/openai/utils.py:151`, `ddtrace/contrib/internal/litellm/utils.py:55`) for streamed chat responses and pass the result straight to `openai_set_meta_tags_from_chat`, which already checks for `reasoning_content` messages but reasoning content messages were never constructed from the streamed response.

The fix checks for and accumulates `delta.reasoning_content` message chunks.

## Notes for reviewers

- The OpenAI Python SDK does not declare `reasoning_content` as a typed field on `ChoiceDelta`, but its `BaseModel` is configured with `extra="allow"` so the field passes through as an attribute when emitted by an OpenAI-compatible provider. LiteLLM's `Delta` type exposes `reasoning_content` directly.
- Avoiding an E2E regression test for now because I don't have a deepseek API key 😢 but the unit tests should be sufficient to get this fix out.

Claude session: `948c6399-4afc-4ad8-acfc-d05db310902d`
Resume: `claude --resume 948c6399-4afc-4ad8-acfc-d05db310902d`

Co-authored-by: yun.kim <yun.kim@datadoghq.com>
(cherry picked from commit b4420fa)

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>
github-actions Bot added a commit that referenced this pull request May 27, 2026
…#18274)

## Summary

Fixes #18257.

The OpenAI + Litellm streamed-chunk aggregator `openai_construct_message_from_streamed_chunks` never read `delta.reasoning_content` from streamed chunks, so OpenAI-compatible reasoning providers (DeepSeek-V3/V4, Qwen reasoning models on Baseten/Fireworks, etc.) had their reasoning text silently dropped on the LLM Obs span.

Both the OpenAI and LiteLLM integrations call this aggregator (`ddtrace/contrib/internal/openai/utils.py:151`, `ddtrace/contrib/internal/litellm/utils.py:55`) for streamed chat responses and pass the result straight to `openai_set_meta_tags_from_chat`, which already checks for `reasoning_content` messages but reasoning content messages were never constructed from the streamed response.

The fix checks for and accumulates `delta.reasoning_content` message chunks.

## Notes for reviewers

- The OpenAI Python SDK does not declare `reasoning_content` as a typed field on `ChoiceDelta`, but its `BaseModel` is configured with `extra="allow"` so the field passes through as an attribute when emitted by an OpenAI-compatible provider. LiteLLM's `Delta` type exposes `reasoning_content` directly.
- Avoiding an E2E regression test for now because I don't have a deepseek API key 😢 but the unit tests should be sufficient to get this fix out.

Claude session: `948c6399-4afc-4ad8-acfc-d05db310902d`
Resume: `claude --resume 948c6399-4afc-4ad8-acfc-d05db310902d`

Co-authored-by: yun.kim <yun.kim@datadoghq.com>
(cherry picked from commit b4420fa)

Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>
Yun-Kim added a commit that referenced this pull request May 27, 2026
… [backport 4.9] (#18287)

Backport #18274 to 4.9

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Yun Kim <35776586+Yun-Kim@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LLMObs: reasoning_content dropped from streamed chat completions (openai/litellm integrations)

2 participants