fix(webhook): use field indexes in verifyScaledObjects to avoid O(N) admission cost by ggarb · Pull Request #7681 · kedacore/keda

ggarb · 2026-04-23T16:24:39Z

Problem

verifyScaledObjects performs two conflict checks on every ScaledObject admission:

Does any existing SO in this namespace already manage the same scaleTargetRef?
Does any existing SO already own the same HPA name?

Both checks called kc.List (the controller-runtime cached client) without any field filter, fetching every ScaledObject in the namespace. Because the cached client DeepCopies every returned item to prevent callers from mutating shared state, this allocates O(N × object_size) per admission.

verifyHpas had the same problem: an unfiltered kc.List over all HPAs in the namespace on every admission, allocating the entire namespace's HPA list regardless of how many HPAs are actually relevant (typically 0–1).

Measured impact

Heap profile during 10k ScaledObject creation burst on KEDA 2.19:

github.com/kedacore/keda/v2/apis/keda/v1alpha1.verifyScaledObjects   106 MB  (71 % of total)
  └─ apis/keda/v1alpha1.(*ScaledObject).DeepCopy                     85 MB  (57 %)
     └─ meta/v1.(*ObjectMeta).DeepCopyInto / (*FieldsV1).DeepCopyInto / ...

RSS during burst reached 748 MiB at ~9k SOs, spiking erratically. At 60k SOs per namespace, each admission allocates ~900 MiB; at 30 admissions/s creation rate this is ~27 GB/s of allocation — enough to overwhelm MADV_FREE and cause the webhook OOMKill loop seen in production-scale tests.

Fix

Register three controller-runtime field indexes in SetupWebhookWithManager:

spec.scaleTargetRef.name on ScaledObject — the target workload name
spec.hpaName on ScaledObject — the computed or explicit HPA name (via getHpaName)
spec.scaleTargetRef.name on HPA — used by verifyHpas to narrow to HPAs targeting the same workload

verifyScaledObjects then issues two narrow client.MatchingFields queries instead of one full-namespace scan. verifyHpas issues one narrow query instead of listing every HPA. In the common case (no duplicates) each query returns 0–1 items; DeepCopy cost collapses from O(N × object_size) to O(1 × object_size) regardless of namespace scale.

Validation

verifyScaledObjects fix — 10k creation burst, before and after:

Metric	Before	After
Peak webhook RSS during burst	748 MiB	111 MiB
`verifyScaledObjects` inuse heap	106 MB (71 %)	0 %
Total inuse heap at 10k SOs	148 MiB	56 MiB
Growth pattern	Volatile spikes + GC thrash	Smooth ~11 MiB/1k SOs (informer cache only)

verifyHpas fix — 60k creation burst, before and after (with verifyScaledObjects fix applied):

Metric	Before	After
Webhook OOMKills (20 GiB limit)	12	0
Webhook RSS at peak	>20 GiB	~1 MiB
Operator restarts	0	0

Post-fix, the dominant inuse allocations are cache.storeIndex.addKeyToIndex (the indexer populating as SOs arrive) — expected steady-state cost, proportional to N.

Extrapolated to 60k SOs: peak ~666 MiB, well within a 2 GiB webhook limit. Pre-fix, 20 GiB was insufficient.

Notes

Pairs with perf(webhooks): remove eager json.MarshalIndent from admission validation hot paths #7670 (remove eager MarshalIndent from admission hot paths) — together they address the two main webhook memory issues at scale.
Existing behaviour is fully preserved: the same conflict conditions are detected, just via indexed lookup instead of full scan.

Checklist

Tests have been added (non-envtest unit tests pass; envtest suite validates indexed List but requires KUBEBUILDER_ASSETS — runs in CI)
Changelog has been updated (Fixes → General)
Commits are signed (DCO)
New scaler — N/A
Schema regen — N/A
Helm chart PR — N/A
Docs PR — N/A

Relates to #7670

snyk-io · 2026-04-23T16:24:59Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scan Engine	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

github-actions · 2026-04-23T16:25:00Z

Thank you for your contribution! 🙏

Please understand that we will do our best to review your PR and give you feedback as soon as possible, but please bear with us if it takes a little longer as expected.

While you are waiting, make sure to:

Add an entry in our changelog in alphabetical order and link related issue
Update the documentation, if needed
Add unit & e2e tests for your changes
GitHub checks are passing
Is the DCO check failing? Here is how you can fix DCO issues

Once the initial tests are successful, a KEDA member will ensure that the e2e tests are run. Once the e2e tests have been successfully completed, the PR may be merged at a later date. Please be patient.

Learn more about our contribution guide.

…admission cost verifyScaledObjects performs two duplicate-conflict checks on every ScaledObject admission: one for duplicate scaleTargetRef and one for duplicate HPA name. Both checks listed ALL ScaledObjects in the namespace via kc.List, which — because controller-runtime's cached client DeepCopies every returned item — allocated O(N) memory per admission. Measured impact with a heap profile during 10k SO creation burst: verifyScaledObjects consumed 71 % of inuse_space (106 MB at peak). At 60k SOs the list allocates ~900 MB per admission; at 30/s creation rate that is ~27 GB/s of allocation, which outpaces MADV_FREE and causes the webhook OOMKill loop seen in production-scale tests. The fix registers two controller-runtime field indexes in SetupWebhookWithManager: spec.scaleTargetRef.name → so.Spec.ScaleTargetRef.Name spec.hpaName → getHpaName(*so) (computed default or explicit override) verifyScaledObjects now issues two narrow indexed List calls (each returning 0–1 items in the common case) instead of one full-namespace scan. Per-admission allocation drops from O(N * object_size) to O(1 * object_size) regardless of cluster scale. Pairs with kedacore#7670 (remove eager MarshalIndent from the same loop) for a complete webhook memory fix at scale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Greg Garber <ggarb@netflix.com>

…rifyHpas Three changes to reduce webhook memory pressure during SO creation bursts: 1. Replace unconditional json.MarshalIndent calls in ValidateCreate, ValidateUpdate, isRemovingFinalizer, and the verifyHpas loop with structured logr key-value logging. The marshals ran on every admission even when V(1) logging was disabled, generating 60-100 KB of transient garbage per request — at burst=60 this outpaced MADV_FREE and caused repeated OOMKills despite the 20 GiB limit. 2. Replace isRemovingFinalizer's JSON string comparison with reflect.DeepEqual, eliminating two spec marshals per update admission. 3. Add hpaScaleTargetNameIdx field index for HPA objects and switch verifyHpas to an indexed List, reducing it from O(N_hpas) to O(1) — same fix pattern as the A1d verifyScaledObjects change (kedacore#7681). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

verifyHpas issued an unfiltered kc.List over all HPAs in the namespace on every ScaledObject admission. At 60k HPAs this allocates the entire namespace's HPA list per admission — the same O(N) anti-pattern fixed for verifyScaledObjects in the previous commit. Add hpaScaleTargetNameIdx (spec.scaleTargetRef.name on HPA objects) in SetupWebhookWithManager and switch verifyHpas to an indexed List, narrowing candidates to 0–1 HPAs that target the same workload. Also replace the per-HPA json.MarshalIndent debug log with a structured logr call since the marshal ran unconditionally regardless of log level. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR reduces admission webhook memory/CPU overhead at scale by replacing full-namespace list scans in ScaledObject/HPA validation with controller-runtime cache field-index lookups.

Changes:

Register cache field indexes for ScaledObject spec.scaleTargetRef.name, computed HPA name, and HPA spec.scaleTargetRef.name during webhook setup.
Update verifyScaledObjects and verifyHpas to use client.MatchingFields queries instead of unfiltered List calls.
Update the changelog to document the performance fix.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
apis/keda/v1alpha1/scaledobject_webhook.go	Adds field indexes and switches validation list operations to indexed lookups to avoid O(N) DeepCopy cost per admission.
CHANGELOG.md	Documents the webhook performance improvement under Fixes → General.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-24T22:26:28Z

+// Field index names used by verifyScaledObjects to avoid O(N) full-namespace
+// list scans on every SO admission. Without these indexes each admission must
+// DeepCopy every ScaledObject in the namespace to find conflicts; at 60k SOs
+// each admission allocates ~900 MiB, which is why the webhook OOMs under
+// creation bursts. The indexes narrow candidates to 0–1 objects.


The comment claims the indexes narrow candidates to “0–1 objects” and describes the lookups as O(1). In practice the ScaledObject index is only on spec.scaleTargetRef.name, so a namespace can legitimately have multiple ScaledObjects targeting different GVKs with the same name (e.g., Deployment "foo" and StatefulSet "foo"), making the lookup O(k) where k is the number of matches for that name. Consider rewording this comment to avoid implying a strict O(1)/0–1 guarantee.

JorTurFer

wow! I didn't know about this optimisation and looks really nice!

Could you fix DCO check?

JorTurFer · 2026-04-24T22:37:26Z

/run-e2e internal
Update: You can check the progress here

rickbrouwer

Could you add unit tests covering the new indexed lookups? I would really like to see some test cases here.

rickbrouwer · 2026-04-25T17:19:59Z

+	scaleTargetRefNameIdx = "spec.scaleTargetRef.name"
+	hpaNameIdx            = "spec.hpaName"
+	// hpaScaleTargetNameIdx indexes HPA objects by spec.scaleTargetRef.name so
+	// verifyHpas can issue an O(1) lookup instead of listing every HPA in the
+	// namespace. Index names are scoped per-GVK so reusing the same path string
+	// as scaleTargetRefNameIdx is safe.
+	hpaScaleTargetNameIdx = "spec.scaleTargetRef.name"


const scaleTargetRefNameIdx and hpaScaleTargetNameIdx has both the same value

dttung2905

Thanks for the PR, great analysis btw 🚀 We still need DCO to be signed. I might have more questions as I go into this in more detail 🙇

dttung2905 · 2026-04-26T07:26:55Z

+	ctx := context.Background()

+	// Check 1: no other SO in this namespace already manages the same workload.
+	// Use the scaleTargetRef.name index so only SOs targeting the same resource


After listing only by scaleTargetRef.name, the loop still must filter by GVK. Can you add or point to a test that has two targets named foo (e.g. Deployment + StatefulSet) and proves no false positive?

ggarb requested a review from a team as a code owner April 23, 2026 16:24

keda-automation requested a review from a team April 23, 2026 16:24

ggarb force-pushed the fix-webhook-verifyScaledObjects-indexer branch from 12a232e to 966b7c5 Compare April 23, 2026 16:26

ggarb mentioned this pull request Apr 23, 2026

fix(webhook): eliminate per-admission json.MarshalIndent and index verifyHpas ggarb/keda#1

Open

ggarb force-pushed the fix-webhook-verifyScaledObjects-indexer branch from 807ea2a to 6e80753 Compare April 23, 2026 22:25

JorTurFer requested a review from Copilot April 24, 2026 22:21

Copilot started reviewing on behalf of JorTurFer April 24, 2026 22:22 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

JorTurFer approved these changes Apr 24, 2026

View reviewed changes

rickbrouwer added Awaiting/2nd-approval This PR needs one more approval review waiting-author-response All PR's or Issues where we are waiting for a response from the author labels Apr 25, 2026

rickbrouwer reviewed Apr 25, 2026

View reviewed changes

dttung2905 reviewed Apr 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(webhook): use field indexes in verifyScaledObjects to avoid O(N) admission cost#7681

fix(webhook): use field indexes in verifyScaledObjects to avoid O(N) admission cost#7681
ggarb wants to merge 2 commits intokedacore:mainfrom
ggarb:fix-webhook-verifyScaledObjects-indexer

ggarb commented Apr 23, 2026 •

edited

Loading

Uh oh!

snyk-io Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

JorTurFer left a comment

Uh oh!

JorTurFer commented Apr 24, 2026 •

edited by github-actions Bot

Loading

Uh oh!

rickbrouwer left a comment

Uh oh!

rickbrouwer Apr 25, 2026

Uh oh!

dttung2905 left a comment

Uh oh!

dttung2905 Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ggarb commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Measured impact

Fix

Validation

Notes

Checklist

Uh oh!

snyk-io Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

JorTurFer left a comment

Choose a reason for hiding this comment

Uh oh!

JorTurFer commented Apr 24, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rickbrouwer left a comment

Choose a reason for hiding this comment

Uh oh!

rickbrouwer Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

dttung2905 left a comment

Choose a reason for hiding this comment

Uh oh!

dttung2905 Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ggarb commented Apr 23, 2026 •

edited

Loading

snyk-io Bot commented Apr 23, 2026 •

edited

Loading

JorTurFer commented Apr 24, 2026 •

edited by github-actions Bot

Loading