fix(webhook): eliminate per-admission json.MarshalIndent and index verifyHpas by ggarb · Pull Request #1 · ggarb/keda

ggarb · 2026-04-23T22:18:51Z

Problem

During a 60k ScaledObject creation burst, the admission webhook OOMKilled 12 times
despite a 20 GiB memory limit. Three root causes:

1. Unconditional json.MarshalIndent on every admission

ValidateCreate, ValidateUpdate, and isRemovingFinalizer each called
json.MarshalIndent to build a debug log string — even when V(1) logging was
disabled. At burst=60 this generates ~60–100 KB of transient garbage per
admission that Go's allocator cannot release via MADV_FREE fast enough under
sustained load, causing RSS to climb until OOMKill.

2. isRemovingFinalizer JSON-string spec comparison

The function marshaled both so.Spec and oldSo.Spec to JSON strings to compare
them, allocating O(spec_size) on every update admission. The correct tool is
reflect.DeepEqual.

3. verifyHpas full-namespace HPA list (O(N))

verifyHpas issued an unfiltered kc.List over all HPAs in the namespace. At 60k
HPAs this allocates the entire namespace's HPA list on every SO admission. Fixed
with a spec.scaleTargetRef.name field index (same pattern as kedacore#7681).

Fix

Replace all json.MarshalIndent + fmt.Sprintf log calls with structured
logr.Info(msg, key, value) — zero allocation when log level is inactive.
Replace isRemovingFinalizer JSON string comparison with reflect.DeepEqual.
Register a spec.scaleTargetRef.name field index on HPA objects in
SetupWebhookWithManager and switch verifyHpas to an indexed List.

Test results

Validated on a 60k ScaledObject creation burst (same cluster and methodology as kedacore#7681):

Metric	Before (fix-webhook-verifyobj-12a232e2)	After (`807ea2a`)
Webhook OOMKills (20 GiB limit)	12	0
Webhook RSS at peak	>20 GiB	~1 MiB
Operator restarts	0	0

This is a companion to kedacore#7681. Together they bring webhook memory during a 60k SO
creation burst from OOMKill territory to effectively zero overhead.

Signed-off-by: Pawan Kumar Regoti <pawanregoti@gmail.com> Signed-off-by: Pawan Regoti <pawanregoti@gmail.com>

Signed-off-by: Jan Wozniak <wozniak.jan@gmail.com>

) When KEDA runs in environments where /sys/fs/cgroup/cpu.max is not readable (e.g. EKS auto-mode, restricted SecurityContexts), maxprocs.Set() returns a permission error and KEDA crashes on startup. The Go runtime handles GOMAXPROCS sensibly without explicit configuration, so this error should not be fatal. Log it as a warning and continue startup. Fixes kedacore#7653 Signed-off-by: ManvithaP-hub <62259625+ManvithaP-hub@users.noreply.github.com>

…admission cost verifyScaledObjects performs two duplicate-conflict checks on every ScaledObject admission: one for duplicate scaleTargetRef and one for duplicate HPA name. Both checks listed ALL ScaledObjects in the namespace via kc.List, which — because controller-runtime's cached client DeepCopies every returned item — allocated O(N) memory per admission. Measured impact with a heap profile during 10k SO creation burst: verifyScaledObjects consumed 71 % of inuse_space (106 MB at peak). At 60k SOs the list allocates ~900 MB per admission; at 30/s creation rate that is ~27 GB/s of allocation, which outpaces MADV_FREE and causes the webhook OOMKill loop seen in production-scale tests. The fix registers two controller-runtime field indexes in SetupWebhookWithManager: spec.scaleTargetRef.name → so.Spec.ScaleTargetRef.Name spec.hpaName → getHpaName(*so) (computed default or explicit override) verifyScaledObjects now issues two narrow indexed List calls (each returning 0–1 items in the common case) instead of one full-namespace scan. Per-admission allocation drops from O(N * object_size) to O(1 * object_size) regardless of cluster scale. Pairs with kedacore#7670 (remove eager MarshalIndent from the same loop) for a complete webhook memory fix at scale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Greg Garber <ggarb@netflix.com>

verifyHpas issued an unfiltered kc.List over all HPAs in the namespace on every ScaledObject admission. At 60k HPAs this allocates the entire namespace's HPA list per admission — the same O(N) anti-pattern fixed for verifyScaledObjects in the previous commit. Add hpaScaleTargetNameIdx (spec.scaleTargetRef.name on HPA objects) in SetupWebhookWithManager and switch verifyHpas to an indexed List, narrowing candidates to 0–1 HPAs that target the same workload. Also replace the per-HPA json.MarshalIndent debug log with a structured logr call since the marshal ran unconditionally regardless of log level. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

pawan-regoti and others added 5 commits April 20, 2026 10:16

Use path.Join for constructing Loki query URL (kedacore#7648)

b840d15

Signed-off-by: Pawan Kumar Regoti <pawanregoti@gmail.com> Signed-off-by: Pawan Regoti <pawanregoti@gmail.com>

refactor pulsar scaler auth (kedacore#6969)

2efa0dd

Signed-off-by: Jan Wozniak <wozniak.jan@gmail.com>

ggarb force-pushed the fix-webhook-verifyScaledObjects-indexer branch from 807ea2a to 6e80753 Compare April 23, 2026 22:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(webhook): eliminate per-admission json.MarshalIndent and index verifyHpas#1

fix(webhook): eliminate per-admission json.MarshalIndent and index verifyHpas#1
ggarb wants to merge 5 commits intomainfrom
fix-webhook-verifyScaledObjects-indexer

ggarb commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ggarb commented Apr 23, 2026

Problem

Fix

Test results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants