Skip to content
Open
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
6490021
+ otel sync tracing support
tewbo Mar 21, 2026
5998749
+ add async spans
tewbo Mar 21, 2026
db01212
+ test and refactor
tewbo Mar 24, 2026
acdc32f
* format
tewbo Mar 24, 2026
7bf72a9
* add otel to test requirements
tewbo Mar 24, 2026
74cc57d
fix black checkstyle
tewbo Mar 24, 2026
de1d6d9
fix flake8 checkstyle
tewbo Mar 24, 2026
3dda417
make property from driver config
tewbo Mar 24, 2026
7c620ae
Merge remote-tracking branch 'upstream/main' into otel-tracing-support
tewbo Apr 4, 2026
b574b77
add docs and fix pr review comments
tewbo Apr 9, 2026
7af5e2c
fix checkstyle and tests
tewbo Apr 9, 2026
3e6b95e
ci: retry failed workflow
tewbo Apr 9, 2026
350b3b6
feat(opentelemetry): retry-policy spans and per-node peer attributes
KirillKurdyukov Apr 20, 2026
3e55d61
refactor(opentelemetry): inline retry spans into ydb.retries
KirillKurdyukov Apr 20, 2026
e11b180
refactor(opentelemetry): peer from endpoint map; add ydb.node.dc; dro…
KirillKurdyukov Apr 20, 2026
70b778d
fix issue
KirillKurdyukov May 1, 2026
c66205a
fix issue
KirillKurdyukov May 1, 2026
29a76ac
Merge remote-tracking branch 'origin/main' into otel-tracing-support
KirillKurdyukov May 1, 2026
bce0e02
fix issue
KirillKurdyukov May 1, 2026
60b9a58
fix issue
KirillKurdyukov May 1, 2026
31b2cf2
fix issue
KirillKurdyukov May 2, 2026
763052a
fix issue
KirillKurdyukov May 2, 2026
dd60ff0
fix issue
KirillKurdyukov May 2, 2026
93fa974
fix issue
KirillKurdyukov May 2, 2026
c4e304d
fix issue
KirillKurdyukov May 2, 2026
0beb6d0
added ydb.BeginTransaction
KirillKurdyukov May 2, 2026
0220d7e
added healthcheck
KirillKurdyukov May 2, 2026
e6721d0
micro refactoring
KirillKurdyukov May 2, 2026
135916b
added tests
KirillKurdyukov May 3, 2026
b5264e2
refactoring
KirillKurdyukov May 3, 2026
75f95ea
fix linter
KirillKurdyukov May 3, 2026
9ec6959
fix linter
KirillKurdyukov May 3, 2026
515d57c
fix linter
KirillKurdyukov May 3, 2026
fc01331
fix linter
KirillKurdyukov May 3, 2026
e6deab1
fix issue
tewbo May 4, 2026
c72eb17
Refactor code
vgvoleg May 7, 2026
ac91ee6
Update test_tracing_async.py
vgvoleg May 7, 2026
febc9e3
review fixes
vgvoleg May 7, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,12 @@ Python client for `YDB <https://ydb.tech/>`_ — a fault-tolerant distributed SQ
coordination
scheme

.. toctree::
:hidden:
:caption: Observability

opentelemetry

.. toctree::
:hidden:
:caption: Reference
Expand Down Expand Up @@ -82,7 +88,7 @@ Distributed Coordination
------------------------

The :doc:`coordination` page covers distributed semaphores and leader election. If you
need to limit concurrent access to a shared resource across multiple processes or hosts,
need to limit concurrent access to aЗе shared resource across multiple processes or hosts,
Comment thread
KirillKurdyukov marked this conversation as resolved.
Outdated
this is the service to use.

Schema Management
Expand All @@ -103,6 +109,15 @@ use the ``@ydb_retry`` decorator. Skipping this section is a common source of pr
incidents.


Observability
-------------

The :doc:`opentelemetry` page explains how to add distributed tracing to your
application using OpenTelemetry. One call to ``enable_tracing()`` instruments
query sessions, transactions, and connection pool operations — so you can
visualize request flow in Jaeger, Grafana, or any OpenTelemetry-compatible backend.


API Reference
-------------

Expand Down
233 changes: 233 additions & 0 deletions docs/opentelemetry.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
OpenTelemetry Tracing
=====================

The SDK provides built-in distributed tracing via `OpenTelemetry <https://opentelemetry.io/>`_.
When enabled, key YDB operations — such as session creation, query execution, transaction
commit/rollback, and driver initialization — produce OpenTelemetry spans. Trace
context is automatically propagated to the YDB server through gRPC metadata using the
`W3C Trace Context <https://www.w3.org/TR/trace-context/>`_ standard.

Tracing is **zero-cost when disabled**: the SDK uses no-op stubs by default, so there is
no overhead unless you explicitly opt in.


Installation
------------

OpenTelemetry packages are not included by default. Install the SDK with the
``opentelemetry`` extra:

.. code-block:: sh

pip install ydb[opentelemetry]

This pulls in ``opentelemetry-api``. You will also need ``opentelemetry-sdk`` and an
exporter for your tracing backend, for example:

.. code-block:: sh

# OTLP/gRPC exporter (works with Jaeger, Tempo, and others)
pip install opentelemetry-exporter-otlp-proto-grpc


Enabling Tracing
----------------

Call ``enable_tracing()`` once, **after** configuring your OpenTelemetry tracer provider
and **before** creating a ``Driver``:

.. code-block:: python

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource

import ydb
from ydb.opentelemetry import enable_tracing

# 1. Set up OpenTelemetry
resource = Resource(attributes={"service.name": "my-service"})
provider = TracerProvider(resource=resource)
provider.add_span_processor(
BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4317"))
)
trace.set_tracer_provider(provider)

# 2. Enable YDB tracing
enable_tracing()

# 3. Use the SDK as usual — spans are created automatically
with ydb.Driver(endpoint="grpc://localhost:2136", database="/local") as driver:
driver.wait(timeout=5)
with ydb.QuerySessionPool(driver) as pool:
pool.execute_with_retries("SELECT 1")

provider.shutdown()

``enable_tracing()`` accepts an optional ``tracer`` argument. If omitted, the SDK
obtains a tracer named ``"ydb.sdk"`` from the global tracer provider.


What Is Instrumented
--------------------

The following operations produce spans:

.. list-table::
:header-rows: 1
:widths: 35 20 45

* - Span Name
- Kind
- Description
* - ``ydb.Driver.Initialize``
- INTERNAL
- Driver wait / endpoint discovery.
* - ``ydb.CreateSession``
- CLIENT
- Creating a new query session.
* - ``ydb.ExecuteQuery``
- CLIENT
- Executing a query (including ``execute_with_retries``).
* - ``ydb.Commit``
- CLIENT
- Committing an explicit transaction.
* - ``ydb.Rollback``
- CLIENT
- Rolling back a transaction.
* - ``ydb.RunWithRetry``
- INTERNAL
- Umbrella span wrapping the whole retryable block (``retry_operation_*`` / ``retry_tx_*`` / ``execute_with_retries``).
* - ``ydb.Try``
- INTERNAL
- A single retry attempt. Carries ``ydb.retry.backoff_ms`` — how long the retrier slept before starting this attempt (``0`` for the first one).
Comment thread
KirillKurdyukov marked this conversation as resolved.
Outdated

All spans are nested under the currently active span, so wrapping your application
logic in a parent span produces a complete trace tree:

.. code-block:: python

tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("handle-request"):
pool.execute_with_retries("SELECT 1")
# ↳ ydb.CreateSession (if a new session is needed)
# ↳ ydb.ExecuteQuery


Span Attributes
---------------

Every YDB RPC (CLIENT-kind) span carries these semantic attributes:

.. list-table::
:header-rows: 1
:widths: 30 70

* - Attribute
- Description
* - ``db.system.name``
- Always ``"ydb"``.
* - ``db.namespace``
- Database path (e.g. ``"/local"``).
* - ``server.address``
- Host from the connection string.
* - ``server.port``
- Port from the connection string.
* - ``network.peer.address``
- Actual node host from the discovery endpoint map (set once the session is attached to a node).
* - ``network.peer.port``
- Actual node port from the discovery endpoint map.
* - ``ydb.node.dc``
- Data-center / location reported by discovery for the node (e.g. ``"vla"``, ``"sas"``).

Additional attributes are set when available:

.. list-table::
:header-rows: 1
:widths: 30 70

* - Attribute
- Description
* - ``ydb.node.id``
- YDB node that handled the request.

Comment thread
vgvoleg marked this conversation as resolved.
On errors, the span also records:

- ``error.type`` — ``"ydb_error"``, ``"transport_error"``, or the Python exception class name.
- ``db.response.status_code`` — the YDB status code name (e.g. ``"SCHEME_ERROR"``).


Trace Context Propagation
-------------------------

When tracing is enabled, the SDK automatically injects trace context headers into
every gRPC call to YDB using the globally configured OpenTelemetry propagator
(``opentelemetry.propagate.inject``). By default, OpenTelemetry uses the
`W3C Trace Context <https://www.w3.org/TR/trace-context/>`_ propagator, which adds
``traceparent`` and ``tracestate`` headers.

YDB server expects W3C Trace Context headers, so the default propagator configuration
works out of the box. This allows the server to correlate client spans with
server-side processing, enabling end-to-end trace visibility across the entire
request path.


Async Usage
-----------

Tracing works identically with the async driver. Call ``enable_tracing()`` once at
startup:

.. code-block:: python

import asyncio
import ydb
from ydb.opentelemetry import enable_tracing

enable_tracing()

async def main():
async with ydb.aio.Driver(
endpoint="grpc://localhost:2136",
database="/local",
) as driver:
await driver.wait(timeout=5)
async with ydb.aio.QuerySessionPool(driver) as pool:
await pool.execute_with_retries("SELECT 1")

asyncio.run(main())



Using a Custom Tracer
---------------------

To use a specific tracer instead of the global one:

.. code-block:: python

from opentelemetry import trace

my_tracer = trace.get_tracer("my.custom.tracer")
enable_tracing(tracer=my_tracer)


Running the Examples
--------------------

The ``examples/opentelemetry/`` directory contains ready-to-run examples with a Docker
Compose setup that starts YDB, an OTLP collector, Tempo, Prometheus, and Grafana:

.. code-block:: sh

cd examples/opentelemetry
docker compose -f compose-e2e.yaml up -d

# Run the example
python example.py

Open `http://localhost:3000 <http://localhost:3000>`_ (Grafana) to explore the
collected traces via the Tempo data source.
61 changes: 61 additions & 0 deletions examples/opentelemetry/compose-e2e.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
version: "3.3"
services:
ydb:
image: ydbplatform/local-ydb:trunk
restart: always
hostname: localhost
platform: linux/amd64
environment:
YDB_DEFAULT_LOG_LEVEL: NOTICE
GRPC_TLS_PORT: "2135"
GRPC_PORT: "2136"
MON_PORT: "8765"
YDB_USE_IN_MEMORY_PDISKS: "true"
command: [ "--config-path", "/ydb_config/ydb-config-with-tracing.yaml" ]
ports:
- "2135:2135"
- "2136:2136"
- "8765:8765"
volumes:
- ./ydb_config:/ydb_config:ro

otel-collector:
image: otel/opentelemetry-collector-contrib:latest
command: [ "--config=/etc/otelcol/config.yaml" ]
volumes:
- ./otel-collector-config.yaml:/etc/otelcol/config.yaml:ro
ports:
- "4317:4317"
- "4318:4318"
- "9464:9464"
- "13133:13133"
- "13317:55679"

prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yaml:/etc/prometheus/prometheus.yml:ro
ports:
- "9090:9090"
depends_on: [ otel-collector ]

tempo:
image: grafana/tempo:2.4.1
command: [ "-config.file=/etc/tempo.yaml" ]
volumes:
- ./tempo.yaml:/etc/tempo.yaml:ro
ports:
- "3200:3200"
depends_on: [ otel-collector ]

grafana:
image: grafana/grafana:10.4.2
environment:
GF_AUTH_ANONYMOUS_ENABLED: "true"
GF_AUTH_ANONYMOUS_ORG_ROLE: "Admin"
volumes:
- ./grafana/provisioning:/etc/grafana/provisioning:ro
- ./grafana/dashboards:/var/lib/grafana/dashboards:ro
ports:
- "3000:3000"
depends_on: [ prometheus, tempo ]
65 changes: 65 additions & 0 deletions examples/opentelemetry/example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
"""Minimal example: OpenTelemetry tracing for YDB Python SDK."""

import asyncio

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource

import ydb
from ydb.opentelemetry import enable_tracing

resource = Resource(attributes={"service.name": "ydb-example"})
provider = TracerProvider(resource=resource)
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4317")))
trace.set_tracer_provider(provider)

tracer = trace.get_tracer(__name__)
enable_tracing(tracer)

ENDPOINT = "grpc://localhost:2136"
DATABASE = "/local"


def sync_example():
"""Sync: session execute and transaction execute + commit."""
with ydb.Driver(endpoint=ENDPOINT, database=DATABASE) as driver:
driver.wait(timeout=5)

with ydb.QuerySessionPool(driver) as pool:
with tracer.start_as_current_span("sync-example"):
pool.execute_with_retries("SELECT 1")
Comment thread
tewbo marked this conversation as resolved.
Outdated

def tx_callee(session):
with session.transaction() as tx:
list(tx.execute("SELECT 1"))
tx.commit()

pool.retry_operation_sync(tx_callee)


async def async_example():
"""Async: session execute and transaction execute + commit."""
async with ydb.aio.Driver(endpoint=ENDPOINT, database=DATABASE) as driver:
await driver.wait(timeout=5)

async with ydb.aio.QuerySessionPool(driver) as pool:
with tracer.start_as_current_span("async-example"):
await pool.execute_with_retries("SELECT 1")

async def tx_callee(session):
async with session.transaction() as tx:
result = await tx.execute("SELECT 1")
async for _ in result:
pass
await tx.commit()

await pool.retry_operation_async(tx_callee)


sync_example()
asyncio.run(async_example())

provider.shutdown()
Loading
Loading