Skip to content

fix: rebuild CA cert pool on rotation to prevent monotonic growth#7700

Open
ManvithaP-hub wants to merge 1 commit intokedacore:mainfrom
ManvithaP-hub:fix-certpool-growth-7691
Open

fix: rebuild CA cert pool on rotation to prevent monotonic growth#7700
ManvithaP-hub wants to merge 1 commit intokedacore:mainfrom
ManvithaP-hub:fix-certpool-growth-7691

Conversation

@ManvithaP-hub
Copy link
Copy Markdown
Contributor

What this PR does
Fixes a performance bug in the gRPC Metrics Server mTLS certificate rotation logic where the x509.CertPool grew unboundedly over the lifetime of the operator pod.
Root Cause
LoadGrpcTLSCredentials created a single x509.CertPool at startup and called AppendCertsFromPEM on it for every fsnotify rotation event. Because x509.CertPool only ever grows and never shrinks, each Kubernetes Secret rotation appended duplicate CA entries to the same pool — causing monotonic memory growth proportional to the number of rotations.
Additionally, config.ClientCAs and config.RootCAs were set once at startup, so even a correct pool rebuild would not have been picked up by subsequent TLS handshakes.
Fix

Introduces buildCertPool() which constructs a fresh x509.CertPool from scratch on every call instead of reusing the existing one
On each rotation event, the new pool replaces the old one under the existing certMutex (pointer swap, no mutation)
Adds GetConfigForClient callback so every TLS handshake reads the current pool from under the mutex rather than the one pinned at startup

Testing

go build ./pkg/metricsservice/... passes cleanly
No existing tests in this package; the fix is contained to pkg/metricsservice/utils/tls.go

@ManvithaP-hub ManvithaP-hub requested a review from a team as a code owner May 1, 2026 02:53
@snyk-io
Copy link
Copy Markdown

snyk-io Bot commented May 1, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Thank you for your contribution! 🙏

Please understand that we will do our best to review your PR and give you feedback as soon as possible, but please bear with us if it takes a little longer as expected.

While you are waiting, make sure to:

  • Add an entry in our changelog in alphabetical order and link related issue
  • Update the documentation, if needed
  • Add unit & e2e tests for your changes
  • GitHub checks are passing
  • Is the DCO check failing? Here is how you can fix DCO issues

Once the initial tests are successful, a KEDA member will ensure that the e2e tests are run. Once the e2e tests have been successfully completed, the PR may be merged at a later date. Please be patient.

Learn more about our contribution guide.

@keda-automation keda-automation requested a review from a team May 1, 2026 02:53
@ManvithaP-hub ManvithaP-hub force-pushed the fix-certpool-growth-7691 branch from 65627a3 to 5581624 Compare May 1, 2026 03:23
@JorTurFer JorTurFer requested a review from Copilot May 3, 2026 17:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix certificate-rotation behavior in the metrics service by rebuilding the CA pool during mTLS secret updates, so the operator does not keep accumulating duplicate CA entries over time.

Changes:

  • Adds a buildCertPool() helper to recreate the CA pool from scratch instead of appending into a long-lived pool.
  • Updates the metrics service TLS reload loop to swap in a rebuilt CA pool and refreshed keypair on fsnotify rotation events.
  • Adds a changelog entry for the metrics service certificate rotation fix.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
pkg/metricsservice/utils/tls.go Refactors gRPC TLS credential loading and certificate rotation logic for the metrics service.
CHANGELOG.md Adds an unreleased fix note for the CA cert pool rotation change.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 141 to +142
config := &tls.Config{
MinVersion: kedautil.GetServiceMinTLSVersion(),
CipherSuites: kedautil.GetServiceTLSCipherList(),
MinVersion: tls.VersionTLS13,
Comment on lines +158 to +160
cfg := &tls.Config{
MinVersion: tls.VersionTLS13,
Certificates: []tls.Certificate{cert},
Comment on lines +153 to +167
GetConfigForClient: func(_ *tls.ClientHelloInfo) (*tls.Config, error) {
certMutex.RLock()
pool := certPool
cert := mTLSCertificate
certMutex.RUnlock()
cfg := &tls.Config{
MinVersion: tls.VersionTLS13,
Certificates: []tls.Certificate{cert},
}
if server {
cfg.ClientAuth = tls.RequireAndVerifyClientCert
cfg.ClientCAs = pool
} else {
cfg.RootCAs = pool
}
Comment on lines 141 to +142
config := &tls.Config{
MinVersion: kedautil.GetServiceMinTLSVersion(),
CipherSuites: kedautil.GetServiceTLSCipherList(),
MinVersion: tls.VersionTLS13,
@JorTurFer
Copy link
Copy Markdown
Member

duplicated #7713

@JorTurFer JorTurFer added the duplicate This issue or pull request already exists label May 3, 2026
Previously, LoadGrpcTLSCredentials created a single x509.CertPool at
startup and called AppendCertsFromPEM on the same pool for every
fsnotify rotation event. Because x509.CertPool only ever grows, this
caused unbounded memory growth over the lifetime of the operator pod.

This commit introduces buildCertPool() which constructs a fresh pool
from scratch on every call. On rotation, the new pool atomically
replaces the old one under the existing certMutex. GetConfigForClient
is used so each TLS handshake reads the current pool rather than the
one pinned at startup.

Fixes kedacore#7691

Signed-off-by: ManvithaP-hub <62259625+ManvithaP-hub@users.noreply.github.com>
@ManvithaP-hub ManvithaP-hub force-pushed the fix-certpool-growth-7691 branch from 5581624 to f62ea7e Compare May 3, 2026 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

duplicate This issue or pull request already exists

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants