Skip to content

Replace dogstatsd metrics with opentelemetry metrics (take 2)#4025

Merged
DrJosh9000 merged 3 commits into
v4from
opentel-metrics-take-2
Jun 24, 2026
Merged

Replace dogstatsd metrics with opentelemetry metrics (take 2)#4025
DrJosh9000 merged 3 commits into
v4from
opentel-metrics-take-2

Conversation

@DrJosh9000

@DrJosh9000 DrJosh9000 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Description

From #3874:

Since datadog metrics were added in (checks notes) 2018 (thanks @lox!), OpenTelemetry has emerged as a standard for metrics that's portable between vendors. Our customers work in all sorts of environments, and not all of them are datadog subscribers.

This in mind, let's pull out the datadog-specific metrics, and replace them with OpenTelemetry ones.

Context

Closes #3874

@moskyb:

I've wanted to do this since the day i joined Buildkite.

Changes

#3874 but rebased onto current v4. (I've been repeatedly rebasing v4 onto main during this time.)

  • Replaces datadog metrics with opentel ones
  • Replaces the jobs.success and jobs.failed counters with jobs.finished — failure or success can be inferred with the exit_status tag that's applied to the metric
  • Rename --tracing-service-name to --telemetry-service-name / BUILDKITE_TRACING_SERVICE_NAME to BUILDKITE_TELEMETRY_SERVICE_NAME

Testing

  • Tests have run locally (with go test ./...). Buildkite employees may check this if the pipeline has run automatically.
  • Code is formatted (with go tool gofumpt -extra -w .)

Disclosures / Credits

All the good work @moskyb did in #3874

@DrJosh9000 DrJosh9000 requested review from a team as code owners June 24, 2026 03:48
@DrJosh9000 DrJosh9000 added v4 Breaking changes that will be included in Agent v4 change Not a new feature, but a user observable non-breaking behavior change. labels Jun 24, 2026
@socket-security

socket-security Bot commented Jun 24, 2026

Copy link
Copy Markdown

@DrJosh9000 DrJosh9000 requested a review from moskyb June 24, 2026 03:50
@DrJosh9000 DrJosh9000 changed the title Replace dogstatsd metrics with opentelemetry metrics Replace dogstatsd metrics with opentelemetry metrics (take 2) Jun 24, 2026

@CerealBoy CerealBoy left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny question, LGTM 🚀 🚀

Comment thread agent/run_job.go

if exit.Status == 0 {
jobMetrics.Timing("jobs.duration.success", finishedAt.Sub(r.startedAt))
jobMetrics.Count("jobs.success", 1)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we still want a success / failed metric here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3874 specifically included in the changes:

Replaces the jobs.success and jobs.failed counters with jobs.finished — failure or success can be inferred with the exit_status tag that's applied to the metric

@DrJosh9000

Copy link
Copy Markdown
Contributor Author

I think I left off two commits. One sec...

@moskyb moskyb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌

Comment thread clicommand/agent_start.go
&cli.StringFlag{
Name: "tracing-service-name",
Usage: "Service name to use when reporting traces.",
Usage: "Service name to use when reporting telemetry.",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the flag also change to telemetry-service-name?'

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, including the env var

@DrJosh9000 DrJosh9000 force-pushed the opentel-metrics-take-2 branch 2 times, most recently from a3cd613 to eec4c75 Compare June 24, 2026 04:51
@DrJosh9000 DrJosh9000 force-pushed the opentel-metrics-take-2 branch from eec4c75 to df7e0af Compare June 24, 2026 04:55
@DrJosh9000 DrJosh9000 merged commit 9332390 into v4 Jun 24, 2026
3 of 4 checks passed
@DrJosh9000 DrJosh9000 deleted the opentel-metrics-take-2 branch June 24, 2026 04:56
DrJosh9000 added a commit that referenced this pull request Jun 24, 2026
Replace dogstatsd metrics with opentelemetry metrics (take 2)
DrJosh9000 added a commit that referenced this pull request Jun 24, 2026
Replace dogstatsd metrics with opentelemetry metrics (take 2)
DrJosh9000 added a commit that referenced this pull request Jun 24, 2026
Replace dogstatsd metrics with opentelemetry metrics (take 2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change Not a new feature, but a user observable non-breaking behavior change. v4 Breaking changes that will be included in Agent v4

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants