Synchronize gateway liveness (#15096)#15150
Synchronize gateway liveness (#15096)#15150officialasishkumar wants to merge 2 commits intolinkerd:mainfrom
Conversation
raykroeker
left a comment
There was a problem hiding this comment.
The lack of gatewayAlive synchronization was discussed as part of the initial code's introduction.
I agree with the original conversation, this is not practically a data-race even though static analysis suggests it is.
adleong
left a comment
There was a problem hiding this comment.
Since this field is just a single bool, did you consider using an atomic bool instead of adding a mutex?
I false-started an implementation that used an atomic bool before I found the attached PR |
Problem RemoteClusterServiceWatcher updates gateway liveness from a background goroutine while mirror endpoint reconciliation reads the same state without synchronization. This leaves the multicluster service mirror vulnerable to race detector failures and inconsistent readiness updates under gateway flapping. Solution Protect gateway liveness with an atomic bool accessor, route the liveness loop through a helper that uses those accessors, and update existing tests to use the synchronized setter. Add a race-focused regression test that exercises the liveness watcher concurrently with endpoint readiness updates. Validation - go test ./multicluster/service-mirror/... - go test -race ./multicluster/service-mirror/... - go test -race ./multicluster/service-mirror -run TestGatewayAliveSynchronization -count=1 Fixes linkerd#15096 Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
c02c418 to
de0bb5f
Compare
|
Replaced the mutex-protected bool with |
Problem
RemoteClusterServiceWatcher updates gateway liveness from a background goroutine while mirror endpoint reconciliation reads the same state without synchronization. This leaves the multicluster service mirror vulnerable to race detector failures and inconsistent readiness updates under gateway flapping.
Solution
Protect gateway liveness with a dedicated RWMutex-backed accessor, route the liveness loop through a helper that uses those accessors, and update existing tests to use the synchronized setter. Add a race-focused regression test that exercises the liveness watcher concurrently with endpoint readiness updates.
Validation
Fixes #15096
Signed-off-by: Asish Kumar officialasishkumar@gmail.com