fix(webhook): eliminate per-admission json.MarshalIndent and index verifyHpas#1
Open
fix(webhook): eliminate per-admission json.MarshalIndent and index verifyHpas#1
Conversation
Signed-off-by: Pawan Kumar Regoti <pawanregoti@gmail.com> Signed-off-by: Pawan Regoti <pawanregoti@gmail.com>
Signed-off-by: Jan Wozniak <wozniak.jan@gmail.com>
) When KEDA runs in environments where /sys/fs/cgroup/cpu.max is not readable (e.g. EKS auto-mode, restricted SecurityContexts), maxprocs.Set() returns a permission error and KEDA crashes on startup. The Go runtime handles GOMAXPROCS sensibly without explicit configuration, so this error should not be fatal. Log it as a warning and continue startup. Fixes kedacore#7653 Signed-off-by: ManvithaP-hub <62259625+ManvithaP-hub@users.noreply.github.com>
…admission cost verifyScaledObjects performs two duplicate-conflict checks on every ScaledObject admission: one for duplicate scaleTargetRef and one for duplicate HPA name. Both checks listed ALL ScaledObjects in the namespace via kc.List, which — because controller-runtime's cached client DeepCopies every returned item — allocated O(N) memory per admission. Measured impact with a heap profile during 10k SO creation burst: verifyScaledObjects consumed 71 % of inuse_space (106 MB at peak). At 60k SOs the list allocates ~900 MB per admission; at 30/s creation rate that is ~27 GB/s of allocation, which outpaces MADV_FREE and causes the webhook OOMKill loop seen in production-scale tests. The fix registers two controller-runtime field indexes in SetupWebhookWithManager: spec.scaleTargetRef.name → so.Spec.ScaleTargetRef.Name spec.hpaName → getHpaName(*so) (computed default or explicit override) verifyScaledObjects now issues two narrow indexed List calls (each returning 0–1 items in the common case) instead of one full-namespace scan. Per-admission allocation drops from O(N * object_size) to O(1 * object_size) regardless of cluster scale. Pairs with kedacore#7670 (remove eager MarshalIndent from the same loop) for a complete webhook memory fix at scale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Greg Garber <ggarb@netflix.com>
verifyHpas issued an unfiltered kc.List over all HPAs in the namespace on every ScaledObject admission. At 60k HPAs this allocates the entire namespace's HPA list per admission — the same O(N) anti-pattern fixed for verifyScaledObjects in the previous commit. Add hpaScaleTargetNameIdx (spec.scaleTargetRef.name on HPA objects) in SetupWebhookWithManager and switch verifyHpas to an indexed List, narrowing candidates to 0–1 HPAs that target the same workload. Also replace the per-HPA json.MarshalIndent debug log with a structured logr call since the marshal ran unconditionally regardless of log level. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
807ea2a to
6e80753
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
During a 60k ScaledObject creation burst, the admission webhook OOMKilled 12 times
despite a 20 GiB memory limit. Three root causes:
1. Unconditional
json.MarshalIndenton every admissionValidateCreate,ValidateUpdate, andisRemovingFinalizereach calledjson.MarshalIndentto build a debug log string — even when V(1) logging wasdisabled. At
burst=60this generates ~60–100 KB of transient garbage peradmission that Go's allocator cannot release via MADV_FREE fast enough under
sustained load, causing RSS to climb until OOMKill.
2.
isRemovingFinalizerJSON-string spec comparisonThe function marshaled both
so.SpecandoldSo.Specto JSON strings to comparethem, allocating O(spec_size) on every update admission. The correct tool is
reflect.DeepEqual.3.
verifyHpasfull-namespace HPA list (O(N))verifyHpasissued an unfilteredkc.Listover all HPAs in the namespace. At 60kHPAs this allocates the entire namespace's HPA list on every SO admission. Fixed
with a
spec.scaleTargetRef.namefield index (same pattern as kedacore#7681).Fix
json.MarshalIndent+fmt.Sprintflog calls with structuredlogr.Info(msg, key, value)— zero allocation when log level is inactive.isRemovingFinalizerJSON string comparison withreflect.DeepEqual.spec.scaleTargetRef.namefield index on HPA objects inSetupWebhookWithManagerand switchverifyHpasto an indexed List.Test results
Validated on a 60k ScaledObject creation burst (same cluster and methodology as kedacore#7681):
This is a companion to kedacore#7681. Together they bring webhook memory during a 60k SO
creation burst from OOMKill territory to effectively zero overhead.