Skip to content

Add E2E test verifying OTEL tracing across Zenko services#2378

Open
delthas wants to merge 1 commit into
development/2.15from
improvement/ZENKO-5258/otel-tracing-e2e-test
Open

Add E2E test verifying OTEL tracing across Zenko services#2378
delthas wants to merge 1 commit into
development/2.15from
improvement/ZENKO-5258/otel-tracing-e2e-test

Conversation

@delthas
Copy link
Copy Markdown
Contributor

@delthas delthas commented Apr 17, 2026

Summary

  • Deploy Jaeger all-in-one (memory-only, OTLP-enabled, pinned to 1.76.0)
    in the kind CI cluster alongside existing dependencies (Keycloak,
    Prometheus, etc.)
  • Patch the Zenko CR with spec.tracing (enabled, sampling ratio 1.0)
    so every request is traced during CI — this also acts as a smoke test
    that OTEL doesn't break existing @PreMerge tests
  • Add a new @PreMerge CTST scenario that puts an S3 object and then
    polls the Jaeger query API to assert a trace exists with spans from
    both connector-cloudserver and connector-vault (matching the
    OTEL_SERVICE_NAME values emitted by the operator)

What changed

File What
install-kind-dependencies.sh Jaeger Deployment + Service YAML (pinned jaegertracing/all-in-one:1.76.0)
configure-e2e-ctst.sh kubectl patch zenko with spec.tracing, wait for cloudserver + vault rollout
setup-e2e-env.sh Jaeger port-forward + JaegerQueryEndpoint in world params
world/Zenko.ts JaegerQueryEndpoint added to ZenkoWorldParameters
features/otel-tracing.feature New @PreMerge scenario
steps/otel-tracing.ts Jaeger polling (30s timeout / 2s interval), bucket-name trace filter (parallel-worker safe), trace assertions

Why

Parent ticket OS-1072
tracks adding OpenTelemetry tracing across the Zenko stack. This PR
adds CI coverage to verify that traces actually propagate end-to-end
(cloudserver → vault) once tracing is enabled on the CR.

Dependencies

Requires zenko-operator PR #607 (ZKOP-539)
to be merged and the bundled operator version bumped in
solution/deps.yaml — it adds the spec.tracing CRD field and
propagates ENABLE_OTEL / OTEL_* env vars to the managed
deployments. Without it, the kubectl patch here will be rejected
as an unknown field.

Issue: ZENKO-5258

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Apr 17, 2026

Hello delthas,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Apr 17, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

@delthas delthas force-pushed the improvement/ZENKO-5258/otel-tracing-e2e-test branch from c362986 to 9a3a251 Compare April 21, 2026 13:20
@delthas delthas force-pushed the improvement/ZENKO-5258/otel-tracing-e2e-test branch from 9a3a251 to f32a16b Compare May 29, 2026 14:22
@delthas delthas changed the base branch from development/2.14 to development/2.15 May 29, 2026 14:22
Comment thread .github/scripts/end2end/install-kind-dependencies.sh Outdated
Comment thread tests/functional/ctst/features/otel-tracing.feature Outdated
@delthas delthas force-pushed the improvement/ZENKO-5258/otel-tracing-e2e-test branch from f32a16b to a7b196c Compare May 29, 2026 14:34
@scality scality deleted a comment from bert-e May 29, 2026
@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented May 29, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

@delthas delthas force-pushed the improvement/ZENKO-5258/otel-tracing-e2e-test branch from a7b196c to c9ae52e Compare May 29, 2026 14:38
@scality scality deleted a comment from bert-e May 29, 2026
@scality scality deleted a comment from bert-e May 29, 2026
@delthas delthas marked this pull request as ready for review May 29, 2026 14:41
- Deploy Jaeger all-in-one (memory-only, OTLP-enabled) in the kind CI
  cluster alongside existing dependencies
- Patch the Zenko CR with `spec.otel` (enabled, sampling ratio 1.0) so
  every request is traced during CI — also acts as a smoke test that
  OTEL doesn't break existing @premerge tests
- Add a new @premerge CTST scenario that puts an S3 object and then
  polls the Jaeger query API to assert a trace exists with spans from
  both cloudserver and vault

Issue: ZENKO-5258
@delthas delthas force-pushed the improvement/ZENKO-5258/otel-tracing-e2e-test branch from c9ae52e to 79d656d Compare May 29, 2026 14:41
@delthas delthas requested review from a team, SylvainSenechal, benzekrimaha and maeldonn and removed request for benzekrimaha May 29, 2026 14:43
Copy link
Copy Markdown
Contributor

@francoisferrand francoisferrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall not sure where to stand here: there is a fundamental compromise: do we want to test the "production" case first and foremost (i.e. without tracing), or is it safe-enough to enable Otel all the time (i.e. not testing production anymore, which could hide crashes or introduce subtle delays...)

# The CR is patched later, after file-backend SSE tests have run.
bash "$(dirname "$0")/../mocks/setup-kmip.sh"

# Enable OTEL tracing on the Zenko CR (always-on in CI, acts as a smoke test)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how long does it add to CI (deployment time) ?
is there a performance impact ? (esp. if performance degradation is preventing from hitting race conditions, not sure it is better :-/ )

kubectl patch zenko ${ZENKO_NAME} -n ${NAMESPACE} --type merge -p '{
"spec": {"tracing": {
"enabled": true,
"samplingRatio": "1.0",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trace every ?


# Enable OTEL tracing on the Zenko CR (always-on in CI, acts as a smoke test)
NAMESPACE="${NAMESPACE:-default}"
kubectl patch zenko ${ZENKO_NAME} -n ${NAMESPACE} --type merge -p '{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of patching Zenko (i.e re-reconcile), should be added to the Zenko CR before it is installed on the cluster.

kubectl rollout status sts/keycloak --timeout=10m

# jaeger all-in-one (OTLP collector + query UI, memory-only)
kubectl apply -f - <<'EOF'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

best to use a separate CR, and keep the bash script simple?

@2.15.0
@PreMerge
Feature: OpenTelemetry Tracing
Traces should propagate across Zenko services
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of tracing every request, could we "trust" localhost : and let the test initiate a request with an Otel trace Id ?
This may limit impact of the change....

Feature: OpenTelemetry Tracing
Traces should propagate across Zenko services

Scenario: S3 PutObject produces a trace spanning cloudserver and vault
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enabling trace all the time means we are not testing the "production" case anymore...

should we instead enable/disable tracing at the beginning and end of this test? (it would reconcile and possibly trigger instability -if we are not resilient-, but would ensure we test both cases...)

Comment on lines +9 to +30
interface JaegerProcess {
serviceName: string;
tags: { key: string; value: string }[];
}

interface JaegerSpan {
traceID: string;
spanID: string;
operationName: string;
processID: string;
tags: { key: string; type: string; value: unknown }[];
}

interface JaegerTrace {
traceID: string;
spans: JaegerSpan[];
processes: Record<string, JaegerProcess>;
}

interface JaegerSearchResponse {
data: JaegerTrace[];
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • are these really Jeager specific? or really standard OpenTelemetry?
  • can't we use types from some sdk, instead of re-defining these?

"UtilizationServicePort":"${UTILIZATION_SERVICE_PORT}",
"KubeconfigPath":"${KUBECONFIG:-${HOME}/.kube/config}"
"KubeconfigPath":"${KUBECONFIG:-${HOME}/.kube/config}",
"JaegerQueryEndpoint":"${JAEGER_QUERY_ENDPOINT}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add trailing comma already?
so we can add new lines without modifying the previous/last one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants