fix(aiokafka): collect kafka cluster_id for DSM by piochelepiotr · Pull Request #18272 · DataDog/dd-trace-py

piochelepiotr · 2026-05-26T20:56:44Z

Summary

The aiokafka integration was missing kafka_cluster_id collection while confluent_kafka already collects it. As a result, DSM edges and produce/commit tracking from aiokafka producers/consumers were missing the kafka_cluster_id tag, breaking cluster-level aggregation in Data Streams Monitoring.
Added an async _get_cluster_id helper that sends a MetadataRequest_v5 to a connected broker and caches the result on the AIOKafkaClient (success + 5 min failure cache, mirroring the confluent_kafka helper).
Set the id as the kafka.cluster_id span tag and propagate it to the DSM hooks via core.set_item("kafka_cluster_id", ...), so consume/produce edge tags and track_kafka_produce / track_kafka_commit calls include it.

Test plan

Locally verified end-to-end with the DSM kafka-demo docker-compose, swapping the producer/consumer for aiokafka apps against confluentinc/cp-kafka:7.5.0. _get_cluster_id(p.client, "demo-orders") returned the broker's CLUSTER_ID (Mka3OEVBNTcwNTJENDM2Qg), and the Datadog agent successfully forwarded traces and data_streams_messages to datad0g.com.
CI: existing aiokafka + DSM test suites should still pass.

🤖 Generated with Claude Code

The aiokafka integration was not collecting the Kafka cluster_id, while the confluent_kafka integration was. This caused Data Streams Monitoring edges and offset commits from aiokafka producers/consumers to be missing the kafka_cluster_id tag, preventing cluster-level aggregation. Fetch cluster_id by sending a MetadataRequest_v5 to a broker the first time a producer/consumer is traced, cache the result on the AIOKafkaClient, and use a 5 minute failure cache to avoid repeated slow lookups (matching the confluent_kafka integration). The id is exposed both as the kafka.cluster_id span tag and via core.set_item("kafka_cluster_id", ...) so the DSM hooks include it in checkpoint edge tags and produce/commit tracking. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cit-pr-commenter-54b7da · 2026-05-26T20:57:57Z

Codeowners resolved as

ddtrace/contrib/internal/aiokafka/patch.py                              @DataDog/apm-core-python @DataDog/apm-idm-python

datadog-prod-us1-5 · 2026-05-26T21:01:00Z

Tests

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 8 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-py | build linux: [amd64, cp315-cp315, v113741238-d2b8243-manylinux2014_x86_64]

🔧 Fix in code (Fix with Cursor).
NotImplementedError: This version of CPython is not supported yet.

DataDog/apm-reliability/dd-trace-py | build linux: [arm64, cp315-cp315, v113741589-d2b8243-musllinux_1_2_aarch64]

🔧 Fix in code (Fix with Cursor).
NotImplementedError: This version of CPython is not supported yet during ddtrace import.

DataDog/apm-reliability/dd-trace-py | build linux serverless: [amd64, cp315-cp315, v113741238-d2b8243-manylinux2014_x86_64, 1]

🛟 This job is unlikely to succeed on retry. Please review your pipeline configuration.
NotImplementedError: This version of CPython is not supported yet

View all 8 failed jobs.

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 9239ad4 | Docs | Datadog PR Page | Give us feedback!}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 446e7971d0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-26T21:02:17Z


 async def traced_commit(func, instance, args, kwargs):
    result = await func(*args, **kwargs)
+    cluster_id = await _get_cluster_id(getattr(instance, "_client", None), None)


Avoid all-topics metadata lookup in commit path

traced_commit always calls _get_cluster_id(..., None), and _get_cluster_id builds MetadataRequest_v5([] , False) when topic is None. In Kafka, an empty topic list requests metadata for the entire cluster, so the first commit on each consumer can trigger a large broker round-trip and add avoidable latency in a hot path (especially with many topics). This should use a concrete topic from commit offsets (as the confluent integration does) or skip cluster-id lookup when no topic is available.

Useful? React with 👍 / 👎.

Addressed in 9239ad4 — traced_commit now reads only the cached _dd_cluster_id and never issues a metadata request from the commit path.

For reference: an empty topics=[] in MetadataRequest_v5 actually means "no topic metadata, brokers/cluster only" — verified by encoding:

[] → \x00\x00\x00\x00 (array length 0, no topics)

None → \xff\xff\xff\xff (null, all topics)

So the produce/getone path doesn't fan out across all topics either. But the commit hot-path concern is fair — switched it to cached-only.

Tracking calls and DSM edge tags now include kafka_cluster_id, so the existing offset-lookup keys (with cluster_id="") and pathway-hash expectations no longer matched. Read the cluster id from the AIOKafkaClient and use it in PartitionKey / pathway tag expectations. Snapshots regenerated to include the new kafka.cluster_id span tag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per PR review: the commit handler should not issue a metadata round-trip. Use only the cached cluster id on the commit path — it is normally populated by a prior produce/consume; if not yet cached, skip the tag for that commit rather than block on a broker request. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

piochelepiotr requested review from a team as code owners May 26, 2026 20:56

piochelepiotr requested review from juanjux, quinna-h and vlad-scherbich May 26, 2026 20:56

chatgpt-codex-connector Bot reviewed May 26, 2026

View reviewed changes

piochelepiotr and others added 2 commits May 27, 2026 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(aiokafka): collect kafka cluster_id for DSM#18272

fix(aiokafka): collect kafka cluster_id for DSM#18272
piochelepiotr wants to merge 3 commits into
mainfrom
fix/aiokafka-cluster-id

piochelepiotr commented May 26, 2026

Uh oh!

cit-pr-commenter-54b7da Bot commented May 26, 2026 •

edited

Loading

Uh oh!

datadog-prod-us1-5 Bot commented May 26, 2026 •

edited by datadog-datadog-prod-us1-2 Bot

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Uh oh!

piochelepiotr May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

piochelepiotr commented May 26, 2026

Summary

Test plan

Uh oh!

cit-pr-commenter-54b7da Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codeowners resolved as

Uh oh!

datadog-prod-us1-5 Bot commented May 26, 2026 • edited by datadog-datadog-prod-us1-2 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

ℹ️ Info

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

piochelepiotr May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cit-pr-commenter-54b7da Bot commented May 26, 2026 •

edited

Loading

datadog-prod-us1-5 Bot commented May 26, 2026 •

edited by datadog-datadog-prod-us1-2 Bot

Loading