Skip to content
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
6490021
+ otel sync tracing support
tewbo Mar 21, 2026
5998749
+ add async spans
tewbo Mar 21, 2026
db01212
+ test and refactor
tewbo Mar 24, 2026
acdc32f
* format
tewbo Mar 24, 2026
7bf72a9
* add otel to test requirements
tewbo Mar 24, 2026
74cc57d
fix black checkstyle
tewbo Mar 24, 2026
de1d6d9
fix flake8 checkstyle
tewbo Mar 24, 2026
3dda417
make property from driver config
tewbo Mar 24, 2026
7c620ae
Merge remote-tracking branch 'upstream/main' into otel-tracing-support
tewbo Apr 4, 2026
b574b77
add docs and fix pr review comments
tewbo Apr 9, 2026
7af5e2c
fix checkstyle and tests
tewbo Apr 9, 2026
3e6b95e
ci: retry failed workflow
tewbo Apr 9, 2026
350b3b6
feat(opentelemetry): retry-policy spans and per-node peer attributes
KirillKurdyukov Apr 20, 2026
3e55d61
refactor(opentelemetry): inline retry spans into ydb.retries
KirillKurdyukov Apr 20, 2026
e11b180
refactor(opentelemetry): peer from endpoint map; add ydb.node.dc; dro…
KirillKurdyukov Apr 20, 2026
70b778d
fix issue
KirillKurdyukov May 1, 2026
c66205a
fix issue
KirillKurdyukov May 1, 2026
29a76ac
Merge remote-tracking branch 'origin/main' into otel-tracing-support
KirillKurdyukov May 1, 2026
bce0e02
fix issue
KirillKurdyukov May 1, 2026
60b9a58
fix issue
KirillKurdyukov May 1, 2026
31b2cf2
fix issue
KirillKurdyukov May 2, 2026
763052a
fix issue
KirillKurdyukov May 2, 2026
dd60ff0
fix issue
KirillKurdyukov May 2, 2026
93fa974
fix issue
KirillKurdyukov May 2, 2026
c4e304d
fix issue
KirillKurdyukov May 2, 2026
0beb6d0
added ydb.BeginTransaction
KirillKurdyukov May 2, 2026
0220d7e
added healthcheck
KirillKurdyukov May 2, 2026
e6721d0
micro refactoring
KirillKurdyukov May 2, 2026
135916b
added tests
KirillKurdyukov May 3, 2026
b5264e2
refactoring
KirillKurdyukov May 3, 2026
75f95ea
fix linter
KirillKurdyukov May 3, 2026
9ec6959
fix linter
KirillKurdyukov May 3, 2026
515d57c
fix linter
KirillKurdyukov May 3, 2026
fc01331
fix linter
KirillKurdyukov May 3, 2026
e6deab1
fix issue
tewbo May 4, 2026
c72eb17
Refactor code
vgvoleg May 7, 2026
ac91ee6
Update test_tracing_async.py
vgvoleg May 7, 2026
febc9e3
review fixes
vgvoleg May 7, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions examples/opentelemetry/compose-e2e.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
version: "3.3"
services:
ydb:
image: ydbplatform/local-ydb:trunk
restart: always
hostname: localhost
platform: linux/amd64
environment:
YDB_DEFAULT_LOG_LEVEL: NOTICE
GRPC_TLS_PORT: "2135"
GRPC_PORT: "2136"
MON_PORT: "8765"
YDB_USE_IN_MEMORY_PDISKS: "true"
command: [ "--config-path", "/ydb_config/ydb-config-with-tracing.yaml" ]
ports:
- "2135:2135"
- "2136:2136"
- "8765:8765"
volumes:
- ./ydb_config:/ydb_config:ro

otel-collector:
image: otel/opentelemetry-collector-contrib:latest
command: [ "--config=/etc/otelcol/config.yaml" ]
volumes:
- ./otel-collector-config.yaml:/etc/otelcol/config.yaml:ro
ports:
- "4317:4317"
- "4318:4318"
- "9464:9464"
- "13133:13133"
- "13317:55679"

prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yaml:/etc/prometheus/prometheus.yml:ro
ports:
- "9090:9090"
depends_on: [ otel-collector ]

tempo:
image: grafana/tempo:2.4.1
command: [ "-config.file=/etc/tempo.yaml" ]
volumes:
- ./tempo.yaml:/etc/tempo.yaml:ro
ports:
- "3200:3200"
depends_on: [ otel-collector ]

grafana:
image: grafana/grafana:10.4.2
environment:
GF_AUTH_ANONYMOUS_ENABLED: "true"
GF_AUTH_ANONYMOUS_ORG_ROLE: "Admin"
volumes:
- ./grafana/provisioning:/etc/grafana/provisioning:ro
- ./grafana/dashboards:/var/lib/grafana/dashboards:ro
ports:
- "3000:3000"
depends_on: [ prometheus, tempo ]
66 changes: 66 additions & 0 deletions examples/opentelemetry/example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
"""Minimal example: OpenTelemetry tracing for YDB Python SDK."""

import asyncio

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource

import ydb
from ydb.opentelemetry import enable_tracing

resource = Resource(attributes={"service.name": "ydb-example"})
provider = TracerProvider(resource=resource)
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4317")))
trace.set_tracer_provider(provider)

enable_tracing()

tracer = trace.get_tracer(__name__)

ENDPOINT = "grpc://localhost:2136"
DATABASE = "/local"


def sync_example():
"""Sync: session execute and transaction execute + commit."""
with ydb.Driver(endpoint=ENDPOINT, database=DATABASE) as driver:
driver.wait(timeout=5)

with ydb.QuerySessionPool(driver) as pool:
with tracer.start_as_current_span("sync-example"):
pool.execute_with_retries("SELECT 1")
Comment thread
tewbo marked this conversation as resolved.
Outdated

def tx_callee(session):
with session.transaction() as tx:
list(tx.execute("SELECT 1"))
tx.commit()

pool.retry_operation_sync(tx_callee)


async def async_example():
"""Async: session execute and transaction execute + commit."""
async with ydb.aio.Driver(endpoint=ENDPOINT, database=DATABASE) as driver:
await driver.wait(timeout=5)

async with ydb.aio.QuerySessionPool(driver) as pool:
with tracer.start_as_current_span("async-example"):
await pool.execute_with_retries("SELECT 1")

async def tx_callee(session):
async with session.transaction() as tx:
result = await tx.execute("SELECT 1")
async for _ in result:
pass
await tx.commit()

await pool.retry_operation_async(tx_callee)


sync_example()
asyncio.run(async_example())

provider.shutdown()
5 changes: 5 additions & 0 deletions examples/opentelemetry/grafana/dashboards/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
This folder is intentionally left empty.

Grafana is provisioned with Tempo + Prometheus datasources; use **Explore** to search traces.


Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apiVersion: 1

providers:
- name: 'default'
orgId: 1
folder: ''
type: file
disableDeletion: true
editable: false
options:
path: /var/lib/grafana/dashboards


Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: 1

datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: false

- name: Tempo
type: tempo
access: proxy
url: http://tempo:3200
editable: false
jsonData:
tracesToMetrics:
datasourceUid: Prometheus
serviceMap:
datasourceUid: Prometheus


44 changes: 44 additions & 0 deletions examples/opentelemetry/otel-collector-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318

processors:
batch: { }

exporters:
prometheus:
endpoint: 0.0.0.0:9464
resource_to_telemetry_conversion:
enabled: true

otlp/tempo:
endpoint: tempo:4317
tls:
insecure: true

debug:
verbosity: detailed

extensions:
health_check:
endpoint: 0.0.0.0:13133

zpages:
endpoint: 0.0.0.0:55679

service:
extensions: [ health_check, zpages ]
pipelines:
metrics:
receivers: [ otlp ]
processors: [ batch ]
exporters: [ prometheus ]

traces:
receivers: [ otlp ]
processors: [ batch ]
exporters: [ otlp/tempo, debug ]
7 changes: 7 additions & 0 deletions examples/opentelemetry/prometheus.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
global:
scrape_interval: 5s

scrape_configs:
- job_name: otel-collector
static_configs:
- targets: ["otel-collector:9464"]
15 changes: 15 additions & 0 deletions examples/opentelemetry/tempo.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
server:
http_listen_port: 3200

distributor:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317

storage:
trace:
backend: local
local:
path: /tmp/tempo
28 changes: 28 additions & 0 deletions examples/opentelemetry/ydb_config/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# YDB server-side tracing (OpenTelemetry)

This folder is used to keep a **custom YDB config** that enables server-side OpenTelemetry tracing.

## 1) Export the default config from a running container

If YDB is running as `ydb-local`:

```bash
docker cp ydb-local:/ydb_data/cluster/kikimr_configs/config.yaml ./ydb_config/ydb-config.yaml
```

## 2) Enable OpenTelemetry exporter in the config

Edit `ydb-config.yaml` and add the contents of `otel-tracing-snippet.yaml` (usually as a top-level section).

Default OTLP endpoint (inside docker-compose network): `grpc://otel-collector:4317`
Default service name (so you can find it in Tempo/Grafana): `ydb`

## 3) Run with the overridden config

Restart YDB (the main `compose-e2e.yaml` will automatically use `--config-path` if `ydb-config.yaml` exists):

```bash
docker-compose -f compose-e2e.yaml up -d --force-recreate ydb
```

Now you should see additional server-side traces in Tempo/Grafana (service name defaults to `ydb-local` in the snippet).
26 changes: 26 additions & 0 deletions examples/opentelemetry/ydb_config/otel-tracing-snippet.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
tracing_config:
backend:
opentelemetry:
collector_url: grpc://otel-collector:4317
service_name: ydb
external_throttling:
- scope:
database: /local
max_traces_per_minute: 60
max_traces_burst: 3
# Highest tracing detail for *sampled* traces (YDB-generated trace-id).
# Note: requests with an external `traceparent` are traced at level 13 (Detailed) per YDB docs.
sampling:
- scope:
database: /local
fraction: 1
level: 15
max_traces_per_minute: 1000
max_traces_burst: 100
uploader:
max_exported_spans_per_second: 30
max_spans_in_batch: 100
max_bytes_in_batch: 10485760 # 10 MiB
max_export_requests_inflight: 3
max_batch_accumulation_milliseconds: 5000
span_export_timeout_seconds: 120
Loading
Loading