fix: rebuild CA cert pool on rotation to prevent monotonic growth#7700
fix: rebuild CA cert pool on rotation to prevent monotonic growth#7700ManvithaP-hub wants to merge 1 commit intokedacore:mainfrom
Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
|
Thank you for your contribution! 🙏 Please understand that we will do our best to review your PR and give you feedback as soon as possible, but please bear with us if it takes a little longer as expected. While you are waiting, make sure to:
Once the initial tests are successful, a KEDA member will ensure that the e2e tests are run. Once the e2e tests have been successfully completed, the PR may be merged at a later date. Please be patient. Learn more about our contribution guide. |
65627a3 to
5581624
Compare
There was a problem hiding this comment.
Pull request overview
This PR aims to fix certificate-rotation behavior in the metrics service by rebuilding the CA pool during mTLS secret updates, so the operator does not keep accumulating duplicate CA entries over time.
Changes:
- Adds a
buildCertPool()helper to recreate the CA pool from scratch instead of appending into a long-lived pool. - Updates the metrics service TLS reload loop to swap in a rebuilt CA pool and refreshed keypair on fsnotify rotation events.
- Adds a changelog entry for the metrics service certificate rotation fix.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
pkg/metricsservice/utils/tls.go |
Refactors gRPC TLS credential loading and certificate rotation logic for the metrics service. |
CHANGELOG.md |
Adds an unreleased fix note for the CA cert pool rotation change. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| config := &tls.Config{ | ||
| MinVersion: kedautil.GetServiceMinTLSVersion(), | ||
| CipherSuites: kedautil.GetServiceTLSCipherList(), | ||
| MinVersion: tls.VersionTLS13, |
| cfg := &tls.Config{ | ||
| MinVersion: tls.VersionTLS13, | ||
| Certificates: []tls.Certificate{cert}, |
| GetConfigForClient: func(_ *tls.ClientHelloInfo) (*tls.Config, error) { | ||
| certMutex.RLock() | ||
| pool := certPool | ||
| cert := mTLSCertificate | ||
| certMutex.RUnlock() | ||
| cfg := &tls.Config{ | ||
| MinVersion: tls.VersionTLS13, | ||
| Certificates: []tls.Certificate{cert}, | ||
| } | ||
| if server { | ||
| cfg.ClientAuth = tls.RequireAndVerifyClientCert | ||
| cfg.ClientCAs = pool | ||
| } else { | ||
| cfg.RootCAs = pool | ||
| } |
| config := &tls.Config{ | ||
| MinVersion: kedautil.GetServiceMinTLSVersion(), | ||
| CipherSuites: kedautil.GetServiceTLSCipherList(), | ||
| MinVersion: tls.VersionTLS13, |
|
duplicated #7713 |
Previously, LoadGrpcTLSCredentials created a single x509.CertPool at startup and called AppendCertsFromPEM on the same pool for every fsnotify rotation event. Because x509.CertPool only ever grows, this caused unbounded memory growth over the lifetime of the operator pod. This commit introduces buildCertPool() which constructs a fresh pool from scratch on every call. On rotation, the new pool atomically replaces the old one under the existing certMutex. GetConfigForClient is used so each TLS handshake reads the current pool rather than the one pinned at startup. Fixes kedacore#7691 Signed-off-by: ManvithaP-hub <62259625+ManvithaP-hub@users.noreply.github.com>
5581624 to
f62ea7e
Compare
What this PR does
Fixes a performance bug in the gRPC Metrics Server mTLS certificate rotation logic where the x509.CertPool grew unboundedly over the lifetime of the operator pod.
Root Cause
LoadGrpcTLSCredentials created a single x509.CertPool at startup and called AppendCertsFromPEM on it for every fsnotify rotation event. Because x509.CertPool only ever grows and never shrinks, each Kubernetes Secret rotation appended duplicate CA entries to the same pool — causing monotonic memory growth proportional to the number of rotations.
Additionally, config.ClientCAs and config.RootCAs were set once at startup, so even a correct pool rebuild would not have been picked up by subsequent TLS handshakes.
Fix
Introduces buildCertPool() which constructs a fresh x509.CertPool from scratch on every call instead of reusing the existing one
On each rotation event, the new pool replaces the old one under the existing certMutex (pointer swap, no mutation)
Adds GetConfigForClient callback so every TLS handshake reads the current pool from under the mutex rather than the one pinned at startup
Testing
go build ./pkg/metricsservice/... passes cleanly
No existing tests in this package; the fix is contained to pkg/metricsservice/utils/tls.go