-
Notifications
You must be signed in to change notification settings - Fork 508
feat(google_cloud_pubsub): add Data Streams Monitoring support #18271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
J0ns0x
wants to merge
4
commits into
main
Choose a base branch
from
feat/google-cloud-pubsub-dsm
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
94b0c4f
feat(google_cloud_pubsub): add Data Streams Monitoring support
J0ns0x fd1fb45
Add type annotations to google_cloud_pubsub DSM module; fix suitespec
J0ns0x ddc0814
Merge branch 'main' into feat/google-cloud-pubsub-dsm
J0ns0x f79998e
Merge branch 'main' into feat/google-cloud-pubsub-dsm
J0ns0x File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| from typing import Any, Dict, Optional, Tuple | ||
|
|
||
| from ddtrace import config | ||
| from ddtrace.internal import core | ||
| from ddtrace.internal.datastreams.processor import DsmPathwayCodec | ||
| from ddtrace.internal.datastreams.utils import _calculate_byte_size | ||
| from ddtrace.internal.logger import get_logger | ||
| from ddtrace.internal.utils import get_argument_value | ||
|
|
||
|
|
||
| log = get_logger(__name__) | ||
|
|
||
| # Reserved kwargs on Publisher.publish that are not message attributes. | ||
| _PUBLISH_RESERVED_KWARGS = frozenset({"data", "ordering_key", "retry", "timeout"}) | ||
|
|
||
|
|
||
| def _extract_publish_attributes(kwargs: Dict[str, Any]) -> Dict[str, Any]: | ||
| return {k: v for k, v in kwargs.items() if k not in _PUBLISH_RESERVED_KWARGS} | ||
|
|
||
|
|
||
| def dsm_pubsub_send(args: Tuple[Any, ...], kwargs: Dict[str, Any], span: Optional[Any]) -> None: | ||
| from . import data_streams_processor as processor | ||
|
|
||
| topic = get_argument_value(args, kwargs, 0, "topic") | ||
| data = get_argument_value(args, kwargs, 1, "data", optional=True) | ||
| ordering_key = kwargs.get("ordering_key", "") | ||
| attributes = _extract_publish_attributes(kwargs) | ||
|
|
||
| payload_size = 0 | ||
| payload_size += _calculate_byte_size(data) | ||
| payload_size += _calculate_byte_size(ordering_key) | ||
| payload_size += _calculate_byte_size(attributes) | ||
|
|
||
| edge_tags = ["direction:out", f"topic:{topic}", "type:google-pubsub"] | ||
| ctx = processor().set_checkpoint(edge_tags, payload_size=payload_size, span=span) | ||
| # Pub/Sub message attributes are passed as **kwargs to Publisher.publish(). | ||
| # Python's varkwargs accepts hyphenated keys like "dd-pathway-ctx-base64" | ||
| # when unpacked, so injecting into the kwargs dict propagates the pathway | ||
| # context to the broker. The existing distributed-tracing HTTPPropagator.inject | ||
| # call uses the same mechanism (see _on_pubsub_send_start in trace_handlers.py). | ||
| DsmPathwayCodec.encode(ctx, kwargs) | ||
|
|
||
|
|
||
| def dsm_pubsub_receive(subscription: str, message: Any, span: Optional[Any]) -> None: | ||
| from . import data_streams_processor as processor | ||
|
|
||
| attributes = dict(message.attributes) if message.attributes else {} | ||
|
|
||
| payload_size = 0 | ||
| payload_size += _calculate_byte_size(getattr(message, "data", None)) | ||
| payload_size += _calculate_byte_size(getattr(message, "ordering_key", "") or "") | ||
| payload_size += _calculate_byte_size(attributes) | ||
|
|
||
| ctx = DsmPathwayCodec.decode(attributes, processor()) | ||
| # AIDEV-NOTE: dd-trace-py uses the `topic:` tag key as the generic destination | ||
| # identifier for every messaging integration (Kafka, Kinesis, SQS, SNS, RabbitMQ). | ||
| # The *value* on the consumer side is the Pub/Sub subscription path, which | ||
| # preserves fan-out distinction (multiple subscriptions on the same topic each | ||
| # produce a distinct pathway node) while keeping the tag schema consistent with | ||
| # the rest of the Python DSM integrations. Note that dd-trace-java uses a | ||
| # dedicated `subscription:` tag key here instead; the wire-format pathway hash | ||
| # is unaffected by that difference. | ||
| edge_tags = ["direction:in", f"topic:{subscription}", "type:google-pubsub"] | ||
| ctx.set_checkpoint(edge_tags, payload_size=payload_size, span=span) | ||
|
|
||
|
|
||
| if config._data_streams_enabled: | ||
| core.on("google_cloud_pubsub.send.pre", dsm_pubsub_send) | ||
| core.on("google_cloud_pubsub.receive.pre", dsm_pubsub_receive) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
8 changes: 8 additions & 0 deletions
8
releasenotes/notes/google-cloud-pubsub-dsm-ad88122f16faba8a.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| --- | ||
| features: | ||
| - | | ||
| google_cloud_pubsub: This introduces Data Streams Monitoring (DSM) context | ||
| propagation for the Google Cloud Pub/Sub integration. Producer publish | ||
| operations inject the DSM pathway context into message attributes, and | ||
| subscriber callbacks extract it and record a consume checkpoint. To enable, | ||
| set ``DD_DATA_STREAMS_ENABLED=true``. |
128 changes: 128 additions & 0 deletions
128
tests/contrib/google_cloud_pubsub/test_google_cloud_pubsub_dsm.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,128 @@ | ||
| import threading | ||
|
|
||
| import pytest | ||
|
|
||
| from ddtrace.internal.datastreams import data_streams_processor | ||
| from ddtrace.internal.datastreams.processor import PROPAGATION_KEY_BASE_64 | ||
| from ddtrace.internal.datastreams.processor import DataStreamsCtx | ||
| from ddtrace.internal.native import DDSketch | ||
| from tests.datastreams.utils import all_pathway_stat_keys | ||
|
|
||
|
|
||
| DSM_TEST_PATH_HEADER_SIZE = 28 | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def dsm_processor(): | ||
| processor = data_streams_processor(reset=True) | ||
| assert processor is not None, "Data Streams Monitoring is not enabled" | ||
| yield processor | ||
| processor.shutdown(timeout=5) | ||
|
|
||
|
|
||
| def _wait_for_pathway_directions(processor, *required, timeout=10.0): | ||
| """Poll DSM buckets until checkpoints for each required direction tag appear.""" | ||
| import time | ||
|
|
||
| deadline = time.time() + timeout | ||
| while time.time() < deadline: | ||
| tag_strs = [" ".join(key[0]) for key in all_pathway_stat_keys(processor)] | ||
| if all(any(req in tags for tags in tag_strs) for req in required): | ||
| return | ||
| time.sleep(0.1) | ||
| raise AssertionError(f"timed out waiting for DSM checkpoints {required}; saw: {tag_strs}") | ||
|
|
||
|
|
||
| def test_dsm_payload_size_produce(dsm_processor, publisher, topic_path): | ||
| """Producer pathway records a payload size that accounts for data, attributes, and the injected pathway header.""" | ||
| payload = b"data streams hello" | ||
| test_attrs = {"custom_key": "custom_value"} | ||
| publisher.publish(topic_path, payload, **test_attrs).result(timeout=10) | ||
|
|
||
| _wait_for_pathway_directions(dsm_processor, "direction:out") | ||
|
|
||
| # Verify a non-zero payload-size sketch was recorded on the producer pathway. | ||
| found_produce_sketch = False | ||
| with dsm_processor._lock: | ||
| for bucket in dsm_processor._buckets.values(): | ||
| for key, stats in bucket.pathway_stats.items(): | ||
| tags = key[0] | ||
| if "direction:out" in tags and any(t.startswith("topic:") for t in tags): | ||
| assert "type:google-pubsub" in tags | ||
| assert stats.payload_size.count >= 1 | ||
| found_produce_sketch = True | ||
| assert found_produce_sketch, "expected producer pathway sketch not recorded" | ||
|
|
||
|
|
||
| def test_dsm_pathway_linkage(dsm_processor, publisher, topic_path, subscriber, subscription_path): | ||
| """Publishing then subscribing produces linked producer→consumer pathway hashes with the expected tag schema.""" | ||
| received = threading.Event() | ||
|
|
||
| def callback(message): | ||
| message.ack() | ||
| received.set() | ||
|
|
||
| future = subscriber.subscribe(subscription_path, callback=callback) | ||
| try: | ||
| publisher.publish(topic_path, b"data streams hello").result(timeout=10) | ||
| assert received.wait(timeout=10), "timed out waiting for subscriber callback" | ||
| finally: | ||
| future.cancel() | ||
| future.result(timeout=5) | ||
|
|
||
| _wait_for_pathway_directions(dsm_processor, "direction:out", "direction:in") | ||
|
|
||
| ctx = DataStreamsCtx(dsm_processor, 0, 0, 0) | ||
| parent_hash = ctx._compute_hash( | ||
| sorted(["direction:out", f"topic:{topic_path}", "type:google-pubsub"]), | ||
| 0, | ||
| ) | ||
| child_hash = ctx._compute_hash( | ||
| sorted(["direction:in", f"topic:{subscription_path}", "type:google-pubsub"]), | ||
| parent_hash, | ||
| ) | ||
| hash_pairs = {(key[1], key[2]) for key in all_pathway_stat_keys(dsm_processor)} | ||
| assert (parent_hash, 0) in hash_pairs, f"producer hash missing; saw {hash_pairs}" | ||
| assert (child_hash, parent_hash) in hash_pairs, f"consumer hash missing; saw {hash_pairs}" | ||
|
|
||
|
|
||
| def test_dsm_pathway_header_injected_on_publish(dsm_processor, publisher, topic_path, subscriber, subscription_path): | ||
| """The dd-pathway-ctx-base64 attribute is injected into published messages and survives the round trip.""" | ||
| publisher.publish(topic_path, b"data streams hello", custom_key="custom_value").result(timeout=10) | ||
|
|
||
| response = subscriber.pull(subscription=subscription_path, max_messages=1, timeout=10) | ||
| assert len(response.received_messages) == 1 | ||
| attributes = dict(response.received_messages[0].message.attributes) | ||
| assert PROPAGATION_KEY_BASE_64 in attributes | ||
| assert attributes[PROPAGATION_KEY_BASE_64] | ||
| # User-provided attributes are preserved alongside the injected pathway key. | ||
| assert attributes["custom_key"] == "custom_value" | ||
|
|
||
|
|
||
| def test_dsm_payload_size_matches_expected(dsm_processor, publisher, topic_path): | ||
| """With distributed tracing disabled, payload size = data + attrs + injected pathway key + path header bytes.""" | ||
| from tests.utils import override_config | ||
|
|
||
| payload = b"abcdef" # 6 bytes | ||
| test_attrs = {"k1": "v1"} # 2 + 2 = 4 bytes of attribute content | ||
| with override_config("google_cloud_pubsub", dict(distributed_tracing_enabled=False)): | ||
| publisher.publish(topic_path, payload, **test_attrs).result(timeout=10) | ||
|
|
||
| _wait_for_pathway_directions(dsm_processor, "direction:out") | ||
|
|
||
| expected_payload_size = float(len(payload) + 4 + len(PROPAGATION_KEY_BASE_64) + DSM_TEST_PATH_HEADER_SIZE) | ||
| expected_sketch = DDSketch() | ||
| expected_sketch.add(expected_payload_size) | ||
| expected_proto = expected_sketch.to_proto() | ||
|
|
||
| with dsm_processor._lock: | ||
| produce_stats = [ | ||
| stats | ||
| for bucket in dsm_processor._buckets.values() | ||
| for key, stats in bucket.pathway_stats.items() | ||
| if "direction:out" in key[0] | ||
| ] | ||
| assert len(produce_stats) >= 1 | ||
| for stats in produce_stats: | ||
| assert stats.payload_size.count >= 1 | ||
| assert stats.payload_size.to_proto() == expected_proto |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reasons to have this? Wouldn't it be preferable to have it enabled? Or is the point that the existing untyped code is now causing issues being checked because it's being checked now that it's been changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch — I've addressed this in fd1fb45. The new module now has full type annotations on all three function signatures (
_extract_publish_attributes,dsm_pubsub_send,dsm_pubsub_receive), so I was able to dropdisallow_untyped_defsanddisallow_incomplete_defsfrom the stanza. The one remaining exemption (disallow_untyped_calls = false) is still needed because the functions call into untyped helpers likedata_streams_processor()andDsmPathwayCodec— the same reason the other DSM integrations (kafka, kombu, botocore) carry that flag.