fix(scaling): jitter first tick of scale loop to avoid thundering herd#7676
fix(scaling): jitter first tick of scale loop to avoid thundering herd#7676ggarb wants to merge 1 commit intokedacore:mainfrom
Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
|
Thank you for your contribution! 🙏 Please understand that we will do our best to review your PR and give you feedback as soon as possible, but please bear with us if it takes a little longer as expected. While you are waiting, make sure to:
Once the initial tests are successful, a KEDA member will ensure that the e2e tests are run. Once the e2e tests have been successfully completed, the PR may be merged at a later date. Please be patient. Learn more about our contribution guide. |
ef5a691 to
213347c
Compare
213347c to
483bc90
Compare
| // ~30-second boundary forever, creating a thundering herd against | ||
| // external metric sources. Keyed off the object UID so the phase is | ||
| // stable across operator restarts. | ||
| if offset := jitterOffset(withTriggers.UID, pollingInterval); offset > 0 { |
There was a problem hiding this comment.
I like the improvement! The only concern I have is that some specific edge cases can use realy long polling intervals. Does it make sense to set a max window? Something like
min(pollingInterval, 1 min)
If the polling interval is short, it will work, but for long polling intervals, the request will be delayed as max 1 min
|
I still need to take a good look at the whole thing, but my first question is what is the reason to put this in its own file rather than inline in scale_handler.go? |
483bc90 to
3e84673
Compare
Scale loops previously fired their first tick via time.NewTimer(pollingInterval) with no per-object offset. ScalableObjects created in a short window (bulk apply, or operator restart re-spawning existing loops) then polled on the same pollingInterval boundary forever, creating a periodic thundering herd against external metric sources. Observed in a load test at 1k Prometheus-trigger ScaledObjects created by kube-burner in a ~33s window: ~50% of subsequent polls failed with HTTP client timeouts even at default pollingInterval 30s, while Prometheus itself was idle (<100ms response, <30 mCPU). Errors came in bursts of ~500 per 6s window every ~10s -- the signature of aligned tick timers, not tail latency. Raising KEDA_HTTP_DEFAULT_TIMEOUT from 3s to 10s did not help; the timeouts simply moved to 10s. This change inserts a deterministic per-object offset (hash(UID) mod pollingInterval) before the first tick, spreading scale loops spawned in a batch across the polling interval. Keyed off UID so the phase is stable across operator restarts -- otherwise every leader-election flip would re-align all loops. Subsequent ticks continue at pollingInterval with no change in semantics. Added helper jitterOffset in pkg/scaling/jitter.go with unit tests covering determinism, range, zero/empty inputs, and distribution across 10k synthetic UIDs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Greg Garber <ggarb@netflix.com>
3e84673 to
3c447de
Compare
Problem
KEDA spawns one scale loop per
ScalableObject(ScaledObject/ScaledJob). Each scale loop usestime.NewTimer(pollingInterval)with no per-object offset (pkg/scaling/scale_handler.go). That means objects whose scale loops arespawned in a short window — by a bulk creation, a single reconcile pass after an operator restart, or a batch API call — end up polling on the same
pollingIntervalboundary forever.For external-metric triggers (Prometheus, custom metrics, etc.) this produces a periodic thundering herd against the upstream metric source and against any client-side infrastructure (DNS, TCP accept queues, per-scaler
http.Transportconnection pools) in the path. Under light load the effect is invisible; under even moderate per-cluster scale it becomes a correctness problem.
Reproduction
Measured on a 1k ScaledObject Prometheus-trigger load test (bulk-created by
kube-burnerin a ~33 s window), KEDA 2.19 release image, defaultpollingInterval: 30s, defaultKEDA_HTTP_DEFAULT_TIMEOUT=3000:context deadline exceeded (Client.Timeout exceeded while awaiting headers).net.(*Resolver).goLookupIPCNAMEOrder— DNS re-resolution for every cold connection.dial tcp: connection refusederrors startappearing too.
Fix
Insert a deterministic per-object offset
fnv64a(UID) % pollingIntervalbefore the first tick of each scale loop. Subsequent ticks are unchanged. Keyed off the object's UID so the phase is stable across operator restarts — otherwiseevery leader-election flip or pod crash would re-align all loops.
Helper lives in
pkg/scaling/jitter.go. Call site is a smallselectagainst the jitter timer andctx.Done()before the existingforloop instartScaleLoop.Validation
Same cluster, same workload, with this patch. Prometheus'
--web.max-connectionswas raised above its default 512 for these runs — with KEDA's current per-scalerhttp.Transport(one keepalive per scaler), N > ~500 scalers exceedsProm's default connection cap regardless of this fix; once polls are no longer bursting, the connection count stabilizes above the default and Prom starts rejecting new connections. That's a separate efficiency concern (shared
Transportacross scalers of the same type) that can be addressed in a follow-up.1k Prom-trigger:
10k Prom-trigger (same patch):
type: cpusizing baseline within 10 %Tests
Added
pkg/scaling/jitter_test.gocovering:[0, pollingInterval)across 1000 synthetic UIDsgo test ./pkg/...passes on this branch.Notes
pollingInterval.ScaledObjectandScaledJob(sharedstartScaleLoop).startPushScalersis unchanged).Checklist