feat(llmobs): support agent-based LLMObs export via APM trace meta_struct by mabdinur · Pull Request #18254 · DataDog/dd-trace-py

mabdinur · 2026-05-23T07:35:04Z

Description

Goal: when LLM Observability rides APM traces via meta_struct["_llmobs"] (APM_AGENT_PROXY / APM_AGENTLESS), stop losing LLMObs events on predicted-drop traces (root sampling_priority <= 0) where the Agent's local sampler / libdatadog short-circuits the chunk before the trace-edge.

Design constraints:

Zero impact on APM sampling decisions and billing (sampling_priority never mutated, no sampling rules added).
No dd-trace-py <-> libdatadog <-> Agent protocol changes; intake already extracts meta_struct["_llmobs"] at /v1/input and /api/v0.2/traces.
Exactly-once intake delivery via in-SDK de-dup (_dd.llmobs.submitted=1 tag + scrub meta_struct) -- intake can OR-dedup as a belt-and-suspenders fallback.
No new DD_API_KEY / DD_SITE requirement for APM_AGENT_PROXY mode.

Design:

LLMObs.enable() forces the APM trace writer to v0.4 via SpanAggregator.reset(llmobs_enabled=True). v0.5 does not carry meta_struct. Mirrors the AppSec recreate hook.
_on_span_finish stashes the prepared LLMObsSpanEvent on the span (CACHED_LLMOBS_EVENT_CTX_KEY) instead of scrubbing + enqueuing. meta_struct rides the APM trace.
New LLMObsSamplingFallbackProcessor (slot in SpanAggregator's hardcoded chain between TraceSamplingProcessor and TraceTagsProcessor) re-ships the cached event via LLMObsSpanWriter on predicted drop, stamps _dd.llmobs.submitted=1, and scrubs meta_struct["_llmobs"].

Edge cases handled:

LLMOBS_DIRECT mode (DD_APM_TRACING_ENABLED=false) and tracer.enabled == False (DD_TRACE_ENABLED=0) keep immediate-ship at _on_span_finish because the trace never reaches the processor chain.
Distributed traces: rescue reads span._local_root.context.sampling_priority so an upstream USER_REJECT is honored on the child service.
Idempotency: rescue early-returns if _dd.llmobs.submitted=1 is already set, so a re-flush or LLMOBS_DIRECT hook cannot cause duplicates.
Cached event missing (user processor dropped it or build failed): rescue scrubs meta_struct to avoid shipping a partial event to APM-side extract.
Non-LLM spans on the same trace are skipped (span_type != SpanTypes.LLM).
Explicit DD_TRACE_API_VERSION=v0.5 + LLMObs enabled: silently downgraded to v0.4 with a log.warning.

Testing

tests/llmobs/test_sampling_fallback_processor.py covers wire-format forcing, every rescue trigger condition, idempotency, no-sampling-side-effect, processor chain wiring, and sampling-priority round-trip.
tests/llmobs/conftest.py installs an always-enqueue variant of the fallback processor so the mocked writer is still exercised on every LLM span finish.

Risks

Users who explicitly set DD_TRACE_API_VERSION=v0.5 and enable LLMObs are silently downgraded to v0.4 (with a warning); without this, the entire meta_struct payload would be lost on the wire. Same approach AppSec already takes.
LLMObsSpanEvent is cached on the span until trace-flush -- only for span_type == SpanTypes.LLM, released as soon as process_trace returns.
All new code paths swallow exceptions and log -- APM trace flow continues even if rescue raises.

Additional Notes

Removed _DD_LLMOBS_TEST_KEEP_META_STRUCT escape hatch (default APM_AGENT_PROXY now keeps meta_struct on the span).
DummyWriter.recreate, CIVisibilityWriter.recreate, LogWriter.recreate, AgentlessTraceWriter.recreate, and NativeWriter.recreate all accept llmobs_enabled: Optional[bool] to keep the TraceWriter interface in sync.

When LLM Observability runs in APM_AGENT_PROXY or APM_AGENTLESS mode, the LLMObs payload rides the APM span via meta_struct["_llmobs"] so a single trace carries both telemetry. That path silently loses the LLMObs event whenever the SDK's local sampler decides the trace should be dropped (root sampling_priority <= 0): the Agent's client-side stats / libdatadog short-circuits the trace before it reaches intake. This change: - Forces the APM trace writer to v0.4 whenever LLMObs is enabled and warns on explicit v0.5, since v0.5 does not carry meta_struct. Mirrors the AppSec recreate() pattern. - Stops scrubbing meta_struct["_llmobs"] in _on_span_finish for APM_AGENT_PROXY / APM_AGENTLESS and stashes the prepared event on the span via a context key for later rescue. - Adds LLMObsSamplingFallbackProcessor in SpanAggregator's hardcoded chain (after TraceSamplingProcessor, before TraceTagsProcessor): on predicted drop it re-ships the cached event via LLMObsSpanWriter, stamps _dd.llmobs.submitted=1 for idempotency, and scrubs meta_struct so APM-side extract and writer-side intake never double-count. - Preserves the LLMOBS_DIRECT immediate-ship behavior when DD_APM_TRACING_ENABLED=false or DD_TRACE_ENABLED=0 (no trace flush ever runs so the rescue chain wouldn't fire). - Never mutates sampling_priority or adds sampling rules; LLM Observability has zero impact on APM sampling decisions or billing. The removed _DD_LLMOBS_TEST_KEEP_META_STRUCT escape hatch is no longer needed because the default APM_AGENT_PROXY behavior now keeps meta_struct on the span; the llmobs test fixture installs a test-only always-enqueue variant of the fallback processor so the mocked writer is still exercised on every LLM span finish. Co-authored-by: Cursor <cursoragent@cursor.com>

datadog-prod-us1-6 · 2026-05-23T07:36:03Z

Tests

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 17 Pipeline jobs failed

System Tests | tracer-release / End-to-end #1 / anthropic-py@0.75.0 1

🔄 Retry job. This looks flaky and may succeed on retry.
13 failed tests. Error: Number (1) of traces not available from test agent, got 0.
🧪 13 Tests failed
tests.integration_frameworks.llm.anthropic.test_anthropic_llmobs.TestAnthropicLlmObsMessages.test_create_content_block[False, anthropic-py@0.75.0] from system_tests_suite (Fix with Cursor)
ValueError: Number (1) of traces not available from test agent, got 0:
[]

self = &lt;tests.integration_frameworks.llm.anthropic.test_anthropic_llmobs.TestAnthropicLlmObsMessages object at 0x7fa668c64e90&gt;
test_agent = &lt;utils.docker_fixtures._test_agent.TestAgentAPI object at 0x7fa669d2e180&gt;
test_client = &lt;utils.docker_fixtures._test_clients._test_client_framework_integrations.FrameworkTestClientApi object at 0x7fa6584d8b00&gt;

    @pytest.mark.parametrize(&#34;stream&#34;, [True, False])
    def test_create_content_block(self, test_agent: TestAgentAPI, test_client: FrameworkTestClientApi, *, stream: bool):
        with test_agent.vcr_context(stream=stream):
...
tests.integration_frameworks.llm.anthropic.test_anthropic_llmobs.TestAnthropicLlmObsMessages.test_create_content_block[True, anthropic-py@0.75.0] from system_tests_suite (Fix with Cursor)
ValueError: Number (1) of traces not available from test agent, got 0:
[]

self = &lt;tests.integration_frameworks.llm.anthropic.test_anthropic_llmobs.TestAnthropicLlmObsMessages object at 0x7fa668c650d0&gt;
test_agent = &lt;utils.docker_fixtures._test_agent.TestAgentAPI object at 0x7fa669cebb30&gt;
test_client = &lt;utils.docker_fixtures._test_clients._test_client_framework_integrations.FrameworkTestClientApi object at 0x7fa657bcaea0&gt;

    @pytest.mark.parametrize(&#34;stream&#34;, [True, False])
    def test_create_content_block(self, test_agent: TestAgentAPI, test_client: FrameworkTestClientApi, *, stream: bool):
        with test_agent.vcr_context(stream=stream):
...
View all 13 test failures
System Tests | tracer-release / End-to-end #1 / google_genai-py@1.55.0 1

🔄 Retry job. This looks flaky and may succeed on retry.
19 failed tests due to insufficient traces available from test agent after requests.
🧪 7 Tests failed
tests.integration_frameworks.llm.google_genai.test_google_genai_llmobs.TestGoogleGenAiEmbedContent.test_embed_content_content_block_input[google_genai-py@1.55.0] from system_tests_suite (Fix with Cursor)
ValueError: Number (1) of traces not available from test agent, got 0:
[]

self = &lt;tests.integration_frameworks.llm.google_genai.test_google_genai_llmobs.TestGoogleGenAiEmbedContent object at 0x7fb5d8b201a0&gt;
test_agent = &lt;utils.docker_fixtures._test_agent.TestAgentAPI object at 0x7fb5c724bc80&gt;
test_client = &lt;utils.docker_fixtures._test_clients._test_client_framework_integrations.FrameworkTestClientApi object at 0x7fb5c723d970&gt;

    def test_embed_content_content_block_input(self, test_agent: TestAgentAPI, test_client: FrameworkTestClientApi):
        with test_agent.vcr_context():
            test_client.request(
...
tests.integration_frameworks.llm.google_genai.test_google_genai_llmobs.TestGoogleGenAiEmbedContent.test_embed_content[google_genai-py@1.55.0] from system_tests_suite (Fix with Cursor)
ValueError: Number (1) of traces not available from test agent, got 0:
[]

self = &lt;tests.integration_frameworks.llm.google_genai.test_google_genai_llmobs.TestGoogleGenAiEmbedContent object at 0x7fb5d8b19880&gt;
test_agent = &lt;utils.docker_fixtures._test_agent.TestAgentAPI object at 0x7fb5c723d130&gt;
test_client = &lt;utils.docker_fixtures._test_clients._test_client_framework_integrations.FrameworkTestClientApi object at 0x7fb5c7228a40&gt;

    def test_embed_content(self, test_agent: TestAgentAPI, test_client: FrameworkTestClientApi):
        with test_agent.vcr_context():
            test_client.request(
...
View all 7 test failures
DataDog/apm-reliability/dd-trace-py | build linux: [amd64, cp315-cp315, v113741238-d2b8243-manylinux2014_x86_64]

🔄 Retry job. This looks flaky and may succeed on retry.
Failed to create pod sandbox due to network allocation failure. No IPs currently available on the node.
View all 17 failed jobs.

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 0829a33 | Docs | Datadog PR Page | Give us feedback!}

cit-pr-commenter-54b7da · 2026-05-23T07:36:51Z

Codeowners resolved as

tests/contrib/botocore/test_bedrock_agents.py                           @DataDog/ml-observability
tests/llmobs/test_llmobs_service.py                                     @DataDog/ml-observability
tests/snapshots/tests.contrib.botocore.test_bedrock_agents.test_agent_invoke_with_step_spans.json  @DataDog/ml-observability

…queue - LLMObs._child_after_fork now reinstalls LLMObsSamplingFallbackProcessor with the recreated post-fork LLMObsSpanWriter. The processor instance captured the pre-fork writer at enable() time and its background worker did not survive fork(), causing silent buffering in child processes. - Match the rescue path and LLMOBS_DIRECT immediate-ship path on the same set_tag -> scrub -> enqueue order so a writer failure cannot leave a partial state where meta_struct["_llmobs"] still rides the APM trace without the de-dup tag. - Release note: add upgrade entry for the agentless APM sampling behavior change in 709084d (DD_TRACE_SAMPLE_RATE/SAMPLING_RULES/RATE_LIMIT are now honored in agentless mode). - Tests: add chain-order positional assertion, no-cached-event rescue branch, processor returns trace unchanged, and tracer-disabled + APM_AGENT_PROXY immediate-ship path. Co-authored-by: Cursor <cursoragent@cursor.com>

Consolidate the comments introduced in this PR: drop narration of the code on the line below ("set the tag", "log the warning") and keep only edge cases or workarounds that cannot be inferred from the surrounding lines (writer rebind after fork, tag+scrub-before-enqueue atomicity, local-root priority, no-cached-event branch, chain-position constraint, v0.5 meta_struct strip). Co-authored-by: Cursor <cursoragent@cursor.com>

pr-commenter · 2026-05-23T08:39:24Z

Benchmarks

Benchmark execution time: 2026-05-25 03:52:58

Comparing candidate commit 0829a33 in PR branch munir/agentbased-llmo with baseline commit fd67a37 in branch main.

Found 0 performance improvements and 4 performance regressions! Performance is the same for 617 metrics, 10 unstable metrics.

scenario:iastaspects-stringio_aspect

🟥 execution_time [+667.144µs; +718.728µs] or [+17.338%; +18.678%]

scenario:iastaspectsospath-ospathbasename_aspect

🟥 execution_time [+102.717µs; +110.564µs] or [+24.084%; +25.924%]

scenario:span-start

🟥 execution_time [+1.463ms; +1.628ms] or [+9.365%; +10.419%]

scenario:telemetryaddmetric-1-count-metric-1-times

🟥 execution_time [+284.226ns; +319.210ns] or [+13.394%; +15.043%]

Rebind the fallback processor after mock writer swap in bedrock/MCP fixtures so meta_struct is not scrubbed under USER_REJECT. Set DD_APM_TRACING_ENABLED for the MCP distributed-tracing subprocess test. Add DD_API_KEY to LLMOBS_DIRECT subprocess tests and read tags before span finish scrubs meta_struct. Co-authored-by: Cursor <cursoragent@cursor.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0829a33d95

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-27T22:02:35Z

+        # the span (APM-side extract would duplicate without the dedup tag).
+        span.set_tag(LLMOBS_SUBMITTED_TAG_KEY, "1")
+        span._remove_struct_tag(LLMOBS_STRUCT.KEY)
+        self._llmobs_span_writer.enqueue(event)


Regenerate agentless fallback events before enqueueing

When _export_mode == APM_AGENTLESS and the root sampling priority is <= 0, this enqueues the cached event that was rendered for APM meta_struct extraction rather than for LLMObsSpanWriter. In that agentless path _prepare_llmobs_span_data has already applied APM-intake-only transformations such as replacing dotted tag keys, and _llmobs_tags/normalization omit error details that the direct span writer normally includes; rejected errored spans or spans with dotted user tags will therefore be rescued with mutated tags or missing error information. Cache a writer-shaped event separately, or rebuild the event before handing it to the span writer.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0829a33d95

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-27T22:02:35Z

+        # the span (APM-side extract would duplicate without the dedup tag).
+        span.set_tag(LLMOBS_SUBMITTED_TAG_KEY, "1")
+        span._remove_struct_tag(LLMOBS_STRUCT.KEY)
+        self._llmobs_span_writer.enqueue(event)


Regenerate agentless fallback events before enqueueing

When _export_mode == APM_AGENTLESS and the root sampling priority is <= 0, this enqueues the cached event that was rendered for APM meta_struct extraction rather than for LLMObsSpanWriter. In that agentless path _prepare_llmobs_span_data has already applied APM-intake-only transformations such as replacing dotted tag keys, and _llmobs_tags/normalization omit error details that the direct span writer normally includes; rejected errored spans or spans with dotted user tags will therefore be rescued with mutated tags or missing error information. Cache a writer-shaped event separately, or rebuild the event before handing it to the span writer.

Useful? React with 👍 / 👎.

KowalskiThomas · 2026-05-28T06:48:06Z

-            self._trace_rate_limit = -1
            self._trace_compute_stats = False
-            setattr(self, "_trace_sampling_rules", "")


Why have those things been removed?

mabdinur and others added 2 commits May 23, 2026 02:26

support sampling when agentless is enabled

709084d

mabdinur changed the title ~~fix(llmobs): rescue LLMObs events when APM trace is predicted dropped~~ fix(llmobs): eliminate LLMObs data loss on unsampled APM traces May 23, 2026

mabdinur changed the title ~~fix(llmobs): eliminate LLMObs data loss on unsampled APM traces~~ feat(llmobs): support agent-based LLMObs export via APM trace meta_struct May 23, 2026

mabdinur and others added 2 commits May 23, 2026 04:03

mabdinur and others added 2 commits May 23, 2026 14:03

fix failing llmo tests

0829a33

mabdinur mentioned this pull request May 25, 2026

feat(agent): extract LLMObs spans from v0.4 trace meta_struct DataDog/dd-apm-test-agent#370

Draft

2 tasks

mabdinur marked this pull request as ready for review May 27, 2026 21:58

mabdinur requested review from a team as code owners May 27, 2026 21:58

mabdinur requested review from emmettbutler and rachelyangdog May 27, 2026 21:58

chatgpt-codex-connector Bot reviewed May 27, 2026

View reviewed changes

KowalskiThomas reviewed May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llmobs): support agent-based LLMObs export via APM trace meta_struct#18254

feat(llmobs): support agent-based LLMObs export via APM trace meta_struct#18254
mabdinur wants to merge 6 commits into
mainfrom
munir/agentbased-llmo

mabdinur commented May 23, 2026 •

edited

Loading

Uh oh!

datadog-prod-us1-6 Bot commented May 23, 2026 •

edited by datadog-official Bot

Loading

Uh oh!

cit-pr-commenter-54b7da Bot commented May 23, 2026 •

edited

Loading

Uh oh!

pr-commenter Bot commented May 23, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 27, 2026

Uh oh!

KowalskiThomas May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mabdinur commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Risks

Additional Notes

Uh oh!

datadog-prod-us1-6 Bot commented May 23, 2026 • edited by datadog-official Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

ℹ️ Info

Uh oh!

cit-pr-commenter-54b7da Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codeowners resolved as

Uh oh!

pr-commenter Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

scenario:iastaspects-stringio_aspect

scenario:iastaspectsospath-ospathbasename_aspect

scenario:span-start

scenario:telemetryaddmetric-1-count-metric-1-times

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

KowalskiThomas May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mabdinur commented May 23, 2026 •

edited

Loading

datadog-prod-us1-6 Bot commented May 23, 2026 •

edited by datadog-official Bot

Loading

cit-pr-commenter-54b7da Bot commented May 23, 2026 •

edited

Loading

pr-commenter Bot commented May 23, 2026 •

edited

Loading