Skip to content

Emit metric tracking empty responses from prometheus#7671

Open
aliaqel-stripe wants to merge 12 commits intokedacore:mainfrom
aliaqel-stripe:drolando/add_empty_response_metric
Open

Emit metric tracking empty responses from prometheus#7671
aliaqel-stripe wants to merge 12 commits intokedacore:mainfrom
aliaqel-stripe:drolando/add_empty_response_metric

Conversation

@aliaqel-stripe
Copy link
Copy Markdown
Contributor

@aliaqel-stripe aliaqel-stripe commented Apr 20, 2026

We'd like to have a way to monitor the number of Keda errors due to empty responses from prometheus after enabling the ignoreNullValues flag for most of our prometheus triggers.

Right now this error gets logged but the error metric that Keda emits is generic and doesn't differentiate by error type.

The metric keda_scaler_empty_upstream_responses_total is labeled with namespace, scaledObject, and triggerName so operators can identify which scaler is producing empty upstream responses.

Tests

E2e tests have been added to tests/sequential/prometheus_metrics/ and tests/sequential/opentelemetry_metrics/ that deploy a real Prometheus instance and verify the metric is emitted with the correct labels when a query returns an empty result.

Checklist

Fixes #7062

drolando-stripe and others added 5 commits April 19, 2026 23:39
Signed-off-by: Daniele Rolando <drolando@stripe.com>
Signed-off-by: Daniele Rolando <drolando@stripe.com>
Signed-off-by: Daniele Rolando <drolando@stripe.com>
Co-authored-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es>
Signed-off-by: drolando-stripe <102543345+drolando-stripe@users.noreply.github.com>
Co-authored-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es>
Signed-off-by: drolando-stripe <102543345+drolando-stripe@users.noreply.github.com>
@aliaqel-stripe aliaqel-stripe requested a review from a team as a code owner April 20, 2026 18:20
@github-actions
Copy link
Copy Markdown

Thank you for your contribution! 🙏

Please understand that we will do our best to review your PR and give you feedback as soon as possible, but please bear with us if it takes a little longer as expected.

While you are waiting, make sure to:

  • Add an entry in our changelog in alphabetical order and link related issue
  • Update the documentation, if needed
  • Add unit & e2e tests for your changes
  • GitHub checks are passing
  • Is the DCO check failing? Here is how you can fix DCO issues

Once the initial tests are successful, a KEDA member will ensure that the e2e tests are run. Once the e2e tests have been successfully completed, the PR may be merged at a later date. Please be patient.

Learn more about our contribution guide.

@keda-automation keda-automation requested a review from a team April 20, 2026 18:20
@snyk-io
Copy link
Copy Markdown

snyk-io Bot commented Apr 20, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

…nse metric

Add labels to keda_scaler_empty_upstream_responses_total so operators can
identify which scaler is producing empty upstream responses. Also add e2e
tests for both Prometheus and OpenTelemetry metric backends.

Signed-off-by: Ali Aqel <aliaqel@stripe.com>
Signed-off-by: Ali Aqel <aliaqel@stripe.com>
@aliaqel-stripe aliaqel-stripe force-pushed the drolando/add_empty_response_metric branch from 5840abf to 2edc55e Compare April 20, 2026 18:25
Copy link
Copy Markdown
Member

@wozniakjan wozniakjan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nits below for your consideration

logger: logger,
scalableObjectName: config.ScalableObjectName,
scalableObjectNS: config.ScalableObjectNamespace,
triggerName: config.TriggerName,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

triggerName can be empty string, would it make sense to add triggerIndex too? that should be readily available from the same config struct

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or metric_name?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding metric_name

Copy link
Copy Markdown
Contributor Author

@aliaqel-stripe aliaqel-stripe Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trigger index is kind of useless because it's just a number and when you have 2000+ scaled objects in a cluster, it doesn't give any userful info

I wonder if we should explore removing it from other metrics?

Comment thread pkg/scalers/prometheus_scaler.go Outdated
if s.metadata.IgnoreNullValues {
return 0, nil
}
metricscollector.RecordEmptyUpstreamResponse(s.scalableObjectNS, s.scalableObjectName, s.triggerName)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it desired to record the metric even when IgnoreNullValues is set to true? The value of IgnoreNullValues can be added as yet another label so users can filter which one they care about. This could surface broken prometheus queries that have been masked by IgnoreNullValues

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding ignorenullvalues as a label

Comment thread pkg/scalers/prometheus_scaler.go Outdated
@wozniakjan wozniakjan added the Awaiting/2nd-approval This PR needs one more approval review label Apr 22, 2026
@wozniakjan wozniakjan requested a review from Copilot April 22, 2026 11:05
@wozniakjan
Copy link
Copy Markdown
Member

wozniakjan commented Apr 22, 2026

/run-e2e prometheus
Update: You can check the progress here

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a dedicated metric to track Prometheus-scaler empty query responses so operators can distinguish this failure mode from generic scaler errors, along with sequential e2e coverage validating labels/attributes for both Prometheus and OpenTelemetry metric pipelines.

Changes:

  • Add keda_scaler_empty_upstream_responses_total (Prometheus) and keda.scaler.empty.upstream.responses (OpenTelemetry) counters with namespace/scaledObject/triggerName labeling.
  • Record the counter from the Prometheus scaler when queries return empty results (when ignoreNullValues=false).
  • Extend sequential e2e tests to deploy a Prometheus instance returning empty results and assert the metric is emitted with expected labels.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/scalers/prometheus_scaler.go Records the new “empty upstream response” metric when Prometheus query results/value arrays are empty.
pkg/metricscollector/prommetrics.go Defines/registers the new Prometheus CounterVec and exposes a recorder method.
pkg/metricscollector/opentelemetry.go Defines/registers the new OTel counter and records it with attributes.
pkg/metricscollector/metricscollectors.go Extends the collector interface and adds a dispatcher function for the new metric.
tests/sequential/prometheus_metrics/prometheus_metrics_test.go Adds sequential test validating the Prometheus-exported metric and labels.
tests/sequential/opentelemetry_metrics/opentelemetry_metrics_test.go Adds sequential test validating the Prometheus-exported view of OTel metric and labels.
CHANGELOG.md Documents the new Prometheus scaler metric in the Unreleased Improvements section.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/metricscollector/prommetrics.go Outdated
Comment thread CHANGELOG.md Outdated
@rickbrouwer rickbrouwer added the waiting-author-response All PR's or Issues where we are waiting for a response from the author label Apr 22, 2026
- Rename scaledObject label -> scaledResource, add metricName, resourceType,
  and ignoreNullValues labels to keda_scaler_empty_upstream_responses_total
- Record metric unconditionally (before IgnoreNullValues guard) so masked
  empty responses are also visible, with ignoreNullValues label for filtering
- Fix CHANGELOG: reference issue kedacore#7062 instead of PR kedacore#7060, capitalize Prometheus
- Update e2e tests to assert new labels

Signed-off-by: Ali Aqel <aliaqel@stripe.com>
@keda-automation keda-automation requested a review from a team April 22, 2026 16:30
@aliaqel-stripe
Copy link
Copy Markdown
Contributor Author

@rickbrouwer ready for 2nd review.

@rickbrouwer rickbrouwer removed the waiting-author-response All PR's or Issues where we are waiting for a response from the author label Apr 24, 2026
@rickbrouwer rickbrouwer requested a review from Copilot April 24, 2026 14:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

pkg/scalers/prometheus_scaler.go:273

  • ExecutePromQuery treats any response with len(result.Data.Result)==0 as an "empty upstream response" and now records the empty-upstream metric. However the Prometheus HTTP API can return HTTP 200 with JSON status: "error" (e.g., invalid query), where data.result will also be empty in this struct. That would incorrectly increment the empty-upstream counter for query errors. Consider checking result.Status (and/or modeling the errorType/error fields) and returning an error before recording the empty-upstream metric unless status == "success".
	var v float64 = -1

	// allow for zero element or single element result sets
	if len(result.Data.Result) == 0 {
		metricscollector.RecordEmptyUpstreamResponse(s.scalableObjectNS, s.scalableObjectName, s.triggerName, s.metricName, s.resourceType, s.metadata.IgnoreNullValues)
		if s.metadata.IgnoreNullValues {
			return 0, nil
		}
		return -1, fmt.Errorf("prometheus metrics 'prometheus' target may be lost, the result is empty")

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/metricscollector/opentelemetry.go
Comment on lines +1376 to +1381
time.Sleep(15 * time.Second)

family := fetchAndParsePrometheusMetrics(t, fmt.Sprintf("curl --insecure %s", kedaOperatorCollectorPrometheusExportURL))
val, ok := family["keda_scaler_empty_upstream_responses_total"]
assert.True(t, ok, "keda_scaler_empty_upstream_responses_total not available")
if ok {
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test relies on a fixed time.Sleep(15 * time.Second) before scraping metrics. On slower clusters/CI runs the metric may not be exported yet, causing intermittent failures. Consider polling until keda_scaler_empty_upstream_responses_total is present with the expected labels (with an overall timeout), similar to how the Prometheus-metrics e2e test waits for metrics to appear.

Copilot uses AI. Check for mistakes.
@JorTurFer
Copy link
Copy Markdown
Member

Could you add another PR to docs adding the otel metric? I see that prometeus one is already merged, but otel is pending 😅

@aliaqel-stripe
Copy link
Copy Markdown
Contributor Author

Could you add another PR to docs adding the otel metric? I see that prometeus one is already merged, but otel is pending 😅

kedacore/keda-docs#1751

Signed-off-by: Ali Aqel <aliaqel@stripe.com>
@keda-automation keda-automation requested a review from a team April 28, 2026 01:18
Signed-off-by: aliaqel-stripe <120822631+aliaqel-stripe@users.noreply.github.com>
@rickbrouwer
Copy link
Copy Markdown
Member

rickbrouwer commented May 7, 2026

/run-e2e prometheus
Update: You can check the progress here

@rickbrouwer rickbrouwer added ok-to-merge This PR can be merged waiting-for-e2e and removed Awaiting/2nd-approval This PR needs one more approval review waiting-for-e2e labels May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ok-to-merge This PR can be merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Emit a metric tracking the number of empty responses from prometheus

6 participants