[elastic_agent] Add TSDB dimensions and fix metric_type declarations by AndersonQ · Pull Request #18508 · elastic/integrations

AndersonQ · 2026-04-17T16:15:24Z

Proposed commit message

[elastic_agent] Add TSDB dimensions and fix metric_type declarations

Add `dimension: true` to fields across the 12 elastic_agent metrics data streams. The added dimensions are redundant with the existing dimensions (1-to-1 with `agent.id` or `component.id`) and therefore do not increase time-series cardinality.

Fix metric_type on:
- beat.stats.libbeat.pipeline.events.active: counter -> gauge
- beat.stats.libbeat.output.events.active: counter -> gauge
- filebeat_input.*.histogram.count: gauge -> counter

Add metric_type on numeric fields that were missing it: system stats, cpu ticks, memstats, handles, runtime.goroutines, uptime, cgroup stats, libbeat pipeline/config/output metrics, write-latency histogram, system.process.cgroup.{memory.mem.failures, cpuacct.percpu}, and filebeat_input.{cel_executions, system_packet_drops}.

Assisted by Claude Code

Checklist

I have reviewed tips for building integrations and this pull request is aligned with them.
I have verified that all data streams collect metrics or logs.
I have added an entry to my package's changelog.yml file.
I have verified that Kibana version constraints are current according to guidelines.
I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

[ ]

How to test this PR locally

Related issues

Screenshots

Add `dimension: true` to fields across the 12 elastic_agent metrics data streams. The added dimensions are redundant with the existing dimensions (1-to-1 with `agent.id` or `component.id`) and therefore do not increase time-series cardinality. Fix metric_type on: - beat.stats.libbeat.pipeline.events.active: counter -> gauge - beat.stats.libbeat.output.events.active: counter -> gauge - filebeat_input.*.histogram.count: gauge -> counter Add metric_type on numeric fields that were missing it: system stats, cpu ticks, memstats, handles, runtime.goroutines, uptime, cgroup stats, libbeat pipeline/config/output metrics, write-latency histogram, system.process.cgroup.{memory.mem.failures, cpuacct.percpu}, and filebeat_input.{cel_executions, system_packet_drops}. Assisted by Claude Code

macroscopeapp · 2026-04-17T16:22:30Z

@@ -195,6 +197,7 @@
              description: CPU time consumed by tasks in user (kernel) mode.
            - name: percpu
              type: long


🟡 Medium fields/fields.yml:199

system.process.cgroup.cpuacct.percpu is marked metric_type: gauge, but it represents cumulative CPU time consumed on each CPU — a monotonically increasing value. TSDB will compute incorrect rates for this field, producing wrong aggregation results. Change to metric_type: counter to match the sibling cpuacct.*.ns fields.

Also found in 8 other location(s)

packages/elastic_agent/data_stream/cloudbeat_metrics/fields/fields.yml:195

system.process.cgroup.cpuacct.percpu at line 195 is given metric_type: gauge, but the field represents "CPU time (in nanoseconds) consumed on each CPU by all tasks in this cgroup," which is a monotonically increasing cumulative value and should be metric_type: counter. Other cpuacct fields in the same file (e.g., cpuacct.total.ns at line 179, cpuacct.stats.user.ns at line 185, cpuacct.stats.system.ns at line 190) are all correctly declared as counter. Using gauge here will cause TSDB to treat cumulative nanosecond values as point-in-time measurements, leading to incorrect rate calculations and misleading metric aggregations.

packages/elastic_agent/data_stream/elastic_agent_metrics/fields/fields.yml:199

cpuacct.percpu at line 199 is assigned metric_type: gauge but its description states "CPU time (in nanoseconds) consumed on each CPU by all tasks in this cgroup." This is cumulative CPU time from the Linux cgroup cpuacct.usage_percpu file — a monotonically increasing value that should be metric_type: counter. Labeling it as gauge causes TSDB to skip counter-specific handling (e.g., rate calculations, counter reset detection), leading to incorrect visualizations and aggregations in Kibana. This directly contradicts the PR's goal of fixing metric_type declarations.

packages/elastic_agent/data_stream/filebeat_metrics/fields/fields.yml:200

cpuacct.percpu at line 200 is assigned metric_type: gauge, but this field represents "CPU time (in nanoseconds) consumed on each CPU by all tasks in this cgroup" — a monotonically increasing cumulative value that should be metric_type: counter. All other cpuacct time fields in this same group (cpuacct.total.ns, cpuacct.stats.user.ns, cpuacct.stats.system.ns) are correctly typed as counter. Using gauge means TSDB will not apply rate/delta calculations correctly for this field, producing incorrect visualizations and aggregations.

packages/elastic_agent/data_stream/fleet_server_metrics/fields/fields.yml:200

metric_type: gauge added to system.process.cgroup.cpuacct.percpu at line 200 is incorrect. This field represents "CPU time (in nanoseconds) consumed on each CPU by all tasks in this cgroup," which is a cumulative, monotonically increasing value and should be metric_type: counter. All sibling cpuacct fields measuring CPU time in nanoseconds (cpuacct.total.ns at line 184, cpuacct.stats.user.ns at line 190, cpuacct.stats.system.ns at line 196) correctly use metric_type: counter. Using gauge will cause TSDB to treat this cumulative counter as a point-in-time value, breaking rate calculations and counter-based aggregations.

packages/elastic_agent/data_stream/heartbeat_metrics/fields/fields.yml:200

system.process.cgroup.cpuacct.percpu at line 200 is given metric_type: gauge, but its description says "CPU time (in nanoseconds) consumed on each CPU by all tasks in this cgroup." CPU time consumed is a monotonically increasing cumulative value and should be metric_type: counter, consistent with the sibling fields cpuacct.total.ns (line 184), cpuacct.stats.user.ns (line 189), and cpuacct.stats.system.ns (line 195), which are all counter. Using gauge causes TSDB to treat this as a point-in-time measurement rather than a cumulative counter, leading to incorrect rate calculations and potentially wrong dashboard visualizations.

packages/elastic_agent/data_stream/metricbeat_metrics/fields/fields.yml:200

system.process.cgroup.cpuacct.percpu at line 200 has metric_type: gauge added, but this field represents "CPU time (in nanoseconds) consumed on each CPU by all tasks in this cgroup." Cumulative CPU time consumed is a monotonically increasing value sourced from the cgroup cpuacct.usage_percpu file, which makes it a counter, not a gauge. Using gauge will cause TSDB to store and handle this metric incorrectly — for example, rate calculations won't be applied automatically, and rollups/downsampling will use last-value semantics instead of cumulative semantics, leading to incorrect metric values in dashboards and alerts.

packages/elastic_agent/data_stream/osquerybeat_metrics/fields/fields.yml:200

system.process.cgroup.cpuacct.percpu at line 200 is assigned metric_type: gauge, but its description states "CPU time (in nanoseconds) consumed on each CPU by all tasks in this cgroup." This is a monotonically increasing cumulative value and should be metric_type: counter, consistent with the sibling cpuacct fields (total.ns at line 184, stats.user.ns at line 190, stats.system.ns at line 196) which are all declared as counter. Using gauge causes TSDB to apply incorrect downsampling (last-value instead of counter-appropriate aggregation), producing incorrect results over time.

packages/elastic_agent/data_stream/packetbeat_metrics/fields/fields.yml:200

The newly added metric_type: gauge for system.process.cgroup.cpuacct.percpu at line 200 is incorrect. This field represents "CPU time (in nanoseconds) consumed on each CPU by all tasks in this cgroup," which is a monotonically increasing cumulative value and should be metric_type: counter, not gauge. All sibling fields under cpuacct that also represent cumulative CPU time (total.ns, stats.user.ns, stats.system.ns) are correctly declared as counter. Using gauge will cause TSDB to apply incorrect aggregation (e.g., last-value instead of rate), leading to misleading metric visualizations and broken rate/delta calculations.

🤖 Copy this AI Prompt to have your agent fix this:

In file packages/elastic_agent/data_stream/apm_server_metrics/fields/fields.yml around line 199: `system.process.cgroup.cpuacct.percpu` is marked `metric_type: gauge`, but it represents cumulative CPU time consumed on each CPU — a monotonically increasing value. TSDB will compute incorrect rates for this field, producing wrong aggregation results. Change to `metric_type: counter` to match the sibling `cpuacct.*.ns` fields. Also found in 8 other location(s): - packages/elastic_agent/data_stream/cloudbeat_metrics/fields/fields.yml:195 -- `system.process.cgroup.cpuacct.percpu` at line 195 is given `metric_type: gauge`, but the field represents "CPU time (in nanoseconds) consumed on each CPU by all tasks in this cgroup," which is a monotonically increasing cumulative value and should be `metric_type: counter`. Other cpuacct fields in the same file (e.g., `cpuacct.total.ns` at line 179, `cpuacct.stats.user.ns` at line 185, `cpuacct.stats.system.ns` at line 190) are all correctly declared as `counter`. Using `gauge` here will cause TSDB to treat cumulative nanosecond values as point-in-time measurements, leading to incorrect rate calculations and misleading metric aggregations. - packages/elastic_agent/data_stream/elastic_agent_metrics/fields/fields.yml:199 -- `cpuacct.percpu` at line 199 is assigned `metric_type: gauge` but its description states "CPU time (in nanoseconds) consumed on each CPU by all tasks in this cgroup." This is cumulative CPU time from the Linux cgroup `cpuacct.usage_percpu` file — a monotonically increasing value that should be `metric_type: counter`. Labeling it as `gauge` causes TSDB to skip counter-specific handling (e.g., rate calculations, counter reset detection), leading to incorrect visualizations and aggregations in Kibana. This directly contradicts the PR's goal of fixing `metric_type` declarations. - packages/elastic_agent/data_stream/filebeat_metrics/fields/fields.yml:200 -- `cpuacct.percpu` at line 200 is assigned `metric_type: gauge`, but this field represents "CPU time (in nanoseconds) consumed on each CPU by all tasks in this cgroup" — a monotonically increasing cumulative value that should be `metric_type: counter`. All other `cpuacct` time fields in this same group (`cpuacct.total.ns`, `cpuacct.stats.user.ns`, `cpuacct.stats.system.ns`) are correctly typed as `counter`. Using `gauge` means TSDB will not apply rate/delta calculations correctly for this field, producing incorrect visualizations and aggregations. - packages/elastic_agent/data_stream/fleet_server_metrics/fields/fields.yml:200 -- `metric_type: gauge` added to `system.process.cgroup.cpuacct.percpu` at line 200 is incorrect. This field represents "CPU time (in nanoseconds) consumed on each CPU by all tasks in this cgroup," which is a cumulative, monotonically increasing value and should be `metric_type: counter`. All sibling `cpuacct` fields measuring CPU time in nanoseconds (`cpuacct.total.ns` at line 184, `cpuacct.stats.user.ns` at line 190, `cpuacct.stats.system.ns` at line 196) correctly use `metric_type: counter`. Using `gauge` will cause TSDB to treat this cumulative counter as a point-in-time value, breaking rate calculations and counter-based aggregations. - packages/elastic_agent/data_stream/heartbeat_metrics/fields/fields.yml:200 -- `system.process.cgroup.cpuacct.percpu` at line 200 is given `metric_type: gauge`, but its description says "CPU time (in nanoseconds) consumed on each CPU by all tasks in this cgroup." CPU time consumed is a monotonically increasing cumulative value and should be `metric_type: counter`, consistent with the sibling fields `cpuacct.total.ns` (line 184), `cpuacct.stats.user.ns` (line 189), and `cpuacct.stats.system.ns` (line 195), which are all `counter`. Using `gauge` causes TSDB to treat this as a point-in-time measurement rather than a cumulative counter, leading to incorrect rate calculations and potentially wrong dashboard visualizations. - packages/elastic_agent/data_stream/metricbeat_metrics/fields/fields.yml:200 -- `system.process.cgroup.cpuacct.percpu` at line 200 has `metric_type: gauge` added, but this field represents "CPU time (in nanoseconds) consumed on each CPU by all tasks in this cgroup." Cumulative CPU time consumed is a monotonically increasing value sourced from the cgroup `cpuacct.usage_percpu` file, which makes it a `counter`, not a `gauge`. Using `gauge` will cause TSDB to store and handle this metric incorrectly — for example, rate calculations won't be applied automatically, and rollups/downsampling will use last-value semantics instead of cumulative semantics, leading to incorrect metric values in dashboards and alerts. - packages/elastic_agent/data_stream/osquerybeat_metrics/fields/fields.yml:200 -- `system.process.cgroup.cpuacct.percpu` at line 200 is assigned `metric_type: gauge`, but its description states "CPU time (in nanoseconds) consumed on each CPU by all tasks in this cgroup." This is a monotonically increasing cumulative value and should be `metric_type: counter`, consistent with the sibling `cpuacct` fields (`total.ns` at line 184, `stats.user.ns` at line 190, `stats.system.ns` at line 196) which are all declared as `counter`. Using `gauge` causes TSDB to apply incorrect downsampling (last-value instead of counter-appropriate aggregation), producing incorrect results over time. - packages/elastic_agent/data_stream/packetbeat_metrics/fields/fields.yml:200 -- The newly added `metric_type: gauge` for `system.process.cgroup.cpuacct.percpu` at line 200 is incorrect. This field represents "CPU time (in nanoseconds) consumed on each CPU by all tasks in this cgroup," which is a monotonically increasing cumulative value and should be `metric_type: counter`, not `gauge`. All sibling fields under `cpuacct` that also represent cumulative CPU time (`total.ns`, `stats.user.ns`, `stats.system.ns`) are correctly declared as `counter`. Using `gauge` will cause TSDB to apply incorrect aggregation (e.g., last-value instead of rate), leading to misleading metric visualizations and broken rate/delta calculations.

macroscopeapp · 2026-04-17T16:22:30Z

        - name: memory.total
          type: long
+          metric_type: gauge


🟡 Medium fields/beat-stats-fields.yml:134

beat.stats.memstats.memory.total is labeled metric_type: gauge, but this field maps to Go's runtime.MemStats.TotalAlloc which tracks cumulative bytes allocated for heap objects — a monotonically increasing counter. With metric_type: gauge, TSDB will not apply counter-specific handling like rate calculations, producing incorrect aggregations and misleading dashboard visualizations. Change to metric_type: counter.

- name: memory.total type: long + metric_type: counter

Also found in 2 other location(s)

packages/elastic_agent/data_stream/auditbeat_metrics/fields/beat-stats-fields.yml:136

memstats.memory.total at line 136 is declared as metric_type: gauge, but this field represents Go's runtime.MemStats.TotalAlloc — cumulative bytes allocated for heap objects. The Beats documentation explicitly describes it as "Cumulative bytes allocated for heap objects" (see docs/reference/filebeat/understand-filebeat-logs.md in the beats repo), and sample log data confirms the value increases monotonically (e.g., 48348409672 → 48352988904 → 48353325376). Since it's a monotonically increasing cumulative value, it should be metric_type: counter, not gauge. Using gauge means TSDB won't apply counter-specific handling (e.g., rate calculations, rollups), leading to incorrect metric interpretation by dashboards and alerting.

packages/elastic_agent/data_stream/filebeat_input_metrics/fields/beat-stats-fields.yml:136

memstats.memory.total at line 136 is declared as metric_type: gauge but it corresponds to Go's runtime.MemStats.TotalAlloc, which is a cumulative counter of bytes allocated for heap objects. The Beats documentation explicitly describes it as "Cumulative bytes allocated for heap objects" and sample log data from the Beats repo confirms it is monotonically increasing. It should be metric_type: counter. Incorrect metric_type causes TSDB to apply wrong downsampling/aggregation logic — for a counter, rate calculations are appropriate, but labeling it as gauge means TSDB may take last-value samples instead of computing rates, leading to incorrect metric aggregations over time.

🤖 Copy this AI Prompt to have your agent fix this:

In file packages/elastic_agent/data_stream/apm_server_metrics/fields/beat-stats-fields.yml around lines 134-136: `beat.stats.memstats.memory.total` is labeled `metric_type: gauge`, but this field maps to Go's `runtime.MemStats.TotalAlloc` which tracks cumulative bytes allocated for heap objects — a monotonically increasing counter. With `metric_type: gauge`, TSDB will not apply counter-specific handling like rate calculations, producing incorrect aggregations and misleading dashboard visualizations. Change to `metric_type: counter`. Also found in 2 other location(s): - packages/elastic_agent/data_stream/auditbeat_metrics/fields/beat-stats-fields.yml:136 -- `memstats.memory.total` at line 136 is declared as `metric_type: gauge`, but this field represents Go's `runtime.MemStats.TotalAlloc` — cumulative bytes allocated for heap objects. The Beats documentation explicitly describes it as "Cumulative bytes allocated for heap objects" (see `docs/reference/filebeat/understand-filebeat-logs.md` in the beats repo), and sample log data confirms the value increases monotonically (e.g., 48348409672 → 48352988904 → 48353325376). Since it's a monotonically increasing cumulative value, it should be `metric_type: counter`, not `gauge`. Using `gauge` means TSDB won't apply counter-specific handling (e.g., rate calculations, rollups), leading to incorrect metric interpretation by dashboards and alerting. - packages/elastic_agent/data_stream/filebeat_input_metrics/fields/beat-stats-fields.yml:136 -- `memstats.memory.total` at line 136 is declared as `metric_type: gauge` but it corresponds to Go's `runtime.MemStats.TotalAlloc`, which is a cumulative counter of bytes allocated for heap objects. The Beats documentation explicitly describes it as "Cumulative bytes allocated for heap objects" and sample log data from the Beats repo confirms it is monotonically increasing. It should be `metric_type: counter`. Incorrect `metric_type` causes TSDB to apply wrong downsampling/aggregation logic — for a counter, rate calculations are appropriate, but labeling it as gauge means TSDB may take last-value samples instead of computing rates, leading to incorrect metric aggregations over time.

elastic-vault-github-plugin-prod · 2026-04-17T16:40:12Z

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

elasticmachine · 2026-04-17T16:40:16Z

💚 Build Succeeded

Buildkite Build
Commit: d2cf4f4

cc @AndersonQ

AndersonQ self-assigned this Apr 17, 2026

AndersonQ added Integration:elastic_agent Elastic Agent Team:Elastic-Agent-Data-Plane Agent Data Plane team [elastic/elastic-agent-data-plane] labels Apr 17, 2026

AndersonQ force-pushed the 17349-elastic-agent-tsdb-check branch from 9c202fc to d2cf4f4 Compare April 17, 2026 16:16

macroscopeapp bot reviewed Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[elastic_agent] Add TSDB dimensions and fix metric_type declarations#18508

[elastic_agent] Add TSDB dimensions and fix metric_type declarations#18508
AndersonQ wants to merge 1 commit intoelastic:mainfrom
AndersonQ:17349-elastic-agent-tsdb-check

AndersonQ commented Apr 17, 2026

Uh oh!

macroscopeapp bot Apr 17, 2026

Uh oh!

macroscopeapp bot Apr 17, 2026

Uh oh!

elastic-vault-github-plugin-prod bot commented Apr 17, 2026

Uh oh!

elasticmachine commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AndersonQ commented Apr 17, 2026

Proposed commit message

Checklist

Author's Checklist

How to test this PR locally

Related issues

Screenshots

Uh oh!

macroscopeapp bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

macroscopeapp bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

elastic-vault-github-plugin-prod bot commented Apr 17, 2026

🚀 Benchmarks report

Uh oh!

elasticmachine commented Apr 17, 2026

💚 Build Succeeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants