Skip to content
Open
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
6490021
+ otel sync tracing support
tewbo Mar 21, 2026
5998749
+ add async spans
tewbo Mar 21, 2026
db01212
+ test and refactor
tewbo Mar 24, 2026
acdc32f
* format
tewbo Mar 24, 2026
7bf72a9
* add otel to test requirements
tewbo Mar 24, 2026
74cc57d
fix black checkstyle
tewbo Mar 24, 2026
de1d6d9
fix flake8 checkstyle
tewbo Mar 24, 2026
3dda417
make property from driver config
tewbo Mar 24, 2026
7c620ae
Merge remote-tracking branch 'upstream/main' into otel-tracing-support
tewbo Apr 4, 2026
b574b77
add docs and fix pr review comments
tewbo Apr 9, 2026
7af5e2c
fix checkstyle and tests
tewbo Apr 9, 2026
3e6b95e
ci: retry failed workflow
tewbo Apr 9, 2026
350b3b6
feat(opentelemetry): retry-policy spans and per-node peer attributes
KirillKurdyukov Apr 20, 2026
3e55d61
refactor(opentelemetry): inline retry spans into ydb.retries
KirillKurdyukov Apr 20, 2026
e11b180
refactor(opentelemetry): peer from endpoint map; add ydb.node.dc; dro…
KirillKurdyukov Apr 20, 2026
70b778d
fix issue
KirillKurdyukov May 1, 2026
c66205a
fix issue
KirillKurdyukov May 1, 2026
29a76ac
Merge remote-tracking branch 'origin/main' into otel-tracing-support
KirillKurdyukov May 1, 2026
bce0e02
fix issue
KirillKurdyukov May 1, 2026
60b9a58
fix issue
KirillKurdyukov May 1, 2026
31b2cf2
fix issue
KirillKurdyukov May 2, 2026
763052a
fix issue
KirillKurdyukov May 2, 2026
dd60ff0
fix issue
KirillKurdyukov May 2, 2026
93fa974
fix issue
KirillKurdyukov May 2, 2026
c4e304d
fix issue
KirillKurdyukov May 2, 2026
0beb6d0
added ydb.BeginTransaction
KirillKurdyukov May 2, 2026
0220d7e
added healthcheck
KirillKurdyukov May 2, 2026
e6721d0
micro refactoring
KirillKurdyukov May 2, 2026
135916b
added tests
KirillKurdyukov May 3, 2026
b5264e2
refactoring
KirillKurdyukov May 3, 2026
75f95ea
fix linter
KirillKurdyukov May 3, 2026
9ec6959
fix linter
KirillKurdyukov May 3, 2026
515d57c
fix linter
KirillKurdyukov May 3, 2026
fc01331
fix linter
KirillKurdyukov May 3, 2026
e6deab1
fix issue
tewbo May 4, 2026
c72eb17
Refactor code
vgvoleg May 7, 2026
ac91ee6
Update test_tracing_async.py
vgvoleg May 7, 2026
febc9e3
review fixes
vgvoleg May 7, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,6 @@
!README.md
!requirements.txt
!pyproject.toml
!setup.py
!setup.py
!examples/opentelemetry/otel_example.py
!examples/opentelemetry/requirements.txt
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
__pycache__
ydb.egg-info/
/.idea
.idea/
/.vscode
/tox
/venv
/.venv
.venv/
/ydb_certs
/ydb_data
/tmp
Expand Down
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
## Unreleased ##
* OpenTelemetry: W3C trace context for gRPC stays bound for the whole ``ExecuteQuery`` stream
(until the result iterator finishes); no long-lived ``context.attach`` on the span;
``disable_tracing()``; correct ``server.*`` from ``grpc://`` endpoints; zero work in
``create_ydb_span`` when tracing is off; one ``ydb.Try`` per attempt for fast retriable
errors in sync retries.

## 3.28.4 ##
* Fix iam module lazy loading
Comment thread
vgvoleg marked this conversation as resolved.

Expand Down
15 changes: 15 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,12 @@ Python client for `YDB <https://ydb.tech/>`_ — a fault-tolerant distributed SQ
coordination
scheme

.. toctree::
:hidden:
:caption: Observability

opentelemetry

.. toctree::
:hidden:
:caption: Reference
Expand Down Expand Up @@ -103,6 +109,15 @@ use the ``@ydb_retry`` decorator. Skipping this section is a common source of pr
incidents.


Observability
-------------

The :doc:`opentelemetry` page explains how to add distributed tracing to your
application using OpenTelemetry. One call to ``enable_tracing()`` instruments
query sessions, transactions, and connection pool operations — so you can
visualize request flow in Jaeger, Grafana, or any OpenTelemetry-compatible backend.


API Reference
-------------

Expand Down
250 changes: 250 additions & 0 deletions docs/opentelemetry.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,250 @@
OpenTelemetry Tracing
=====================

The SDK provides built-in distributed tracing via `OpenTelemetry <https://opentelemetry.io/>`_.
When enabled, key YDB operations — such as session creation, query execution, transaction
commit/rollback, and driver initialization — produce OpenTelemetry spans. Trace
context is automatically propagated to the YDB server through gRPC metadata using the
`W3C Trace Context <https://www.w3.org/TR/trace-context/>`_ standard.

Tracing is **zero-cost when disabled**: the SDK uses no-op stubs by default, so there is
no overhead unless you explicitly opt in.


Installation
------------

OpenTelemetry packages are not included by default. Install the SDK with the
``opentelemetry`` extra:

.. code-block:: sh

pip install ydb[opentelemetry]

This pulls in ``opentelemetry-api``. You will also need ``opentelemetry-sdk`` and an
exporter for your tracing backend, for example:

.. code-block:: sh

# OTLP/gRPC exporter (works with Jaeger, Tempo, and others)
pip install opentelemetry-exporter-otlp-proto-grpc


Enabling Tracing
----------------

Call ``enable_tracing()`` once, **after** configuring your OpenTelemetry tracer provider
and **before** creating a ``Driver``:

.. code-block:: python

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource

import ydb
from ydb.opentelemetry import enable_tracing

# 1. Set up OpenTelemetry
resource = Resource(attributes={"service.name": "my-service"})
provider = TracerProvider(resource=resource)
provider.add_span_processor(
BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4317"))
)
trace.set_tracer_provider(provider)

# 2. Enable YDB tracing
enable_tracing()

# 3. Use the SDK as usual — spans are created automatically
with ydb.Driver(endpoint="grpc://localhost:2136", database="/local") as driver:
driver.wait(timeout=5)
with ydb.QuerySessionPool(driver) as pool:
pool.execute_with_retries("SELECT 1")

provider.shutdown()

``enable_tracing()`` accepts an optional ``tracer`` argument. If omitted, the SDK
obtains a tracer named ``"ydb.sdk"`` from the global tracer provider.

Repeated calls to ``enable_tracing()`` do nothing until you call ``disable_tracing()``,
which removes hooks so you can reconfigure or turn instrumentation off.


What Is Instrumented
--------------------

The following operations produce spans:

.. list-table::
:header-rows: 1
:widths: 35 20 45

* - Span Name
- Kind
- Description
* - ``ydb.Driver.Initialize``
- INTERNAL
- Driver wait / endpoint discovery.
* - ``ydb.CreateSession``
- CLIENT
- Creating a new query session.
* - ``ydb.ExecuteQuery``
- CLIENT
- Executing a query (including ``execute_with_retries``).
* - ``ydb.Commit``
- CLIENT
- Committing an explicit transaction.
* - ``ydb.Rollback``
- CLIENT
- Rolling back a transaction.
* - ``ydb.RunWithRetry``
- INTERNAL
- Umbrella span wrapping the whole retryable block (``retry_operation_*`` / ``retry_tx_*`` / ``execute_with_retries``).
* - ``ydb.Try``
- INTERNAL
- A single retry attempt. From the **second** attempt onward carries
``ydb.retry.backoff_ms`` — how long the retrier slept before starting this
attempt (``0`` on the skip-yield retry path: ``Aborted`` / ``BadSession`` /
``NotFound`` / ``InternalError``, where the protocol prescribes immediate
retry without backoff). The very first ``ydb.Try`` omits the attribute
entirely because nothing preceded it.

All spans are nested under the currently active span, so wrapping your application
logic in a parent span produces a complete trace tree:

.. code-block:: python

tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("handle-request"):
pool.execute_with_retries("SELECT 1")
# ↳ ydb.CreateSession (if a new session is needed)
# ↳ ydb.ExecuteQuery


Span Attributes
---------------

Every YDB RPC (CLIENT-kind) span carries these semantic attributes:

.. list-table::
:header-rows: 1
:widths: 30 70

* - Attribute
- Description
* - ``db.system.name``
- Always ``"ydb"``.
* - ``db.namespace``
- Database path (e.g. ``"/local"``).
* - ``server.address``
- Host from the connection string.
* - ``server.port``
- Port from the connection string.
* - ``network.peer.address``
- Actual node host from the discovery endpoint map (set once the session is attached to a node).
* - ``network.peer.port``
- Actual node port from the discovery endpoint map.
* - ``ydb.node.dc``
- Data-center / location reported by discovery for the node (e.g. ``"vla"``, ``"sas"``).

Additional attributes are set when available:

.. list-table::
:header-rows: 1
:widths: 30 70

* - Attribute
- Description
* - ``ydb.node.id``
- YDB node that handled the request.

Comment thread
vgvoleg marked this conversation as resolved.
On errors, the span also records:

- ``error.type`` — ``"ydb_error"``, ``"transport_error"``, or the Python exception class name.
- ``db.response.status_code`` — the YDB status code name (e.g. ``"SCHEME_ERROR"``).


Trace Context Propagation
-------------------------

When tracing is enabled, the SDK automatically injects trace context headers into
every gRPC call to YDB using the globally configured OpenTelemetry propagator
(``opentelemetry.propagate.inject``). By default, OpenTelemetry uses the
`W3C Trace Context <https://www.w3.org/TR/trace-context/>`_ propagator, which adds
``traceparent`` and ``tracestate`` headers.

YDB server expects W3C Trace Context headers, so the default propagator configuration
works out of the box. This allows the server to correlate client spans with
server-side processing, enabling end-to-end trace visibility across the entire
request path.


Async Usage
-----------

Tracing works identically with the async driver. Call ``enable_tracing()`` once at
startup:

.. code-block:: python

import asyncio
import ydb
from ydb.opentelemetry import enable_tracing

enable_tracing()

async def main():
async with ydb.aio.Driver(
endpoint="grpc://localhost:2136",
database="/local",
) as driver:
await driver.wait(timeout=5)
async with ydb.aio.QuerySessionPool(driver) as pool:
await pool.execute_with_retries("SELECT 1")

asyncio.run(main())



Using a Custom Tracer
---------------------

To use a specific tracer instead of the global one:

.. code-block:: python

from opentelemetry import trace

my_tracer = trace.get_tracer("my.custom.tracer")
enable_tracing(tracer=my_tracer)


Running the Examples
--------------------

The runnable script is ``examples/opentelemetry/otel_example.py`` (bank table + concurrent
Serializable transactions and ``app_startup`` / ``example_tli`` application spans). **Start
Docker (YDB or the full stack) first**, then install and run on the host — see
``examples/opentelemetry/README.md`` for the full order of commands and environment variables.

**Full stack in one command** (YDB + OTLP + Tempo + Grafana; the ``otel-example`` service is built from ``examples/opentelemetry/Dockerfile`` and runs the script once):

.. code-block:: sh

docker compose -f examples/opentelemetry/compose-e2e.yaml up --build

The first run builds the ``otel-example`` image from the local SDK source; subsequent runs reuse the cached image. Pass ``--build`` again if you change the SDK or the demo script.

**Typical local run** (YDB in Docker, script on the host — Compose **before** ``pip`` / ``python``):

.. code-block:: sh

docker compose up -d
pip install -e '.[opentelemetry]' -r examples/opentelemetry/requirements.txt
python examples/opentelemetry/otel_example.py

Open `http://localhost:3000 <http://localhost:3000>`_ (Grafana) to explore traces via Tempo.
21 changes: 21 additions & 0 deletions examples/opentelemetry/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Isolated image for the OpenTelemetry demo. Build context is the repository root.
#
# docker compose -f examples/opentelemetry/compose-e2e.yaml build otel-example
#
# A separate ``.dockerignore`` at the repo root keeps the context small.

FROM python:3.11-slim

WORKDIR /app

# Dependency layer: copy only what setup.py needs so changes to the demo script do
# not bust the cached pip install.
COPY setup.py pyproject.toml README.md requirements.txt ./
COPY ydb ./ydb
COPY examples/opentelemetry/requirements.txt ./examples/opentelemetry/requirements.txt
RUN pip install --no-cache-dir -e '.[opentelemetry]' -r examples/opentelemetry/requirements.txt

# Demo script.
COPY examples/opentelemetry/otel_example.py ./examples/opentelemetry/otel_example.py

CMD ["python", "examples/opentelemetry/otel_example.py"]
Loading
Loading