fix: tolerate transient scaler errors during cache rebuild#7714
fix: tolerate transient scaler errors during cache rebuild#7714aayushbaluni wants to merge 1 commit intokedacore:mainfrom
Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
|
Thank you for your contribution! 🙏 Please understand that we will do our best to review your PR and give you feedback as soon as possible, but please bear with us if it takes a little longer as expected. While you are waiting, make sure to:
Once the initial tests are successful, a KEDA member will ensure that the e2e tests are run. Once the e2e tests have been successfully completed, the PR may be merged at a later date. Please be patient. Learn more about our contribution guide. |
|
Hello Personally I'd not add any other extra mechanism but just accept that the upstream has failed twice in a row, Maybe other colleagues have other opinion, let's give them some time to review the PR |
|
Agreed with @JorTurFer. Shouldn't this be handled in scalers_cache.go rather than in the callers? The |
This was a race condition that we solved last year IIRC |
|
@JorTurFer @rickbrouwer Thanks for the review and context. You're right — the existing two-attempt retry in
Given that, this PR adds unnecessary complexity. I'm happy to close it if you both agree the current behavior is correct as-is. Let me know if there's a different angle you'd like me to explore instead. |
|
In which version have you seen |
|
2.19 is reported in the issue. We see him occasionally too in 2.19. I think it can, just a super quick analysis. The author is welcome to dive into it themselves if the author wish. Should the author not do so and close PR, then I will dive in to see if everything is correct. |
Problem
During scaler cache rebuild, a transient "redis: client is closed" error incorrectly sets
Ready=FalseonScaledJob, causing unnecessary condition flapping.Change
Add tolerance for transient scaler errors during cache rebuild so
Readycondition is not set toFalsefor brief mid-rebuild client closures.Fixes #7574.
Made with Cursor