Fix nil pointer panic in Deliverer.Stop() during leader election#5420
Fix nil pointer panic in Deliverer.Stop() during leader election#5420Ady0333 wants to merge 2 commits into
Conversation
Deliverer.Stop() and BFTDeliverer.Stop() unconditionally called d.blockReceiver.Stop() without a nil guard. blockReceiver is only set inside DeliverBlocks() after a successful orderer connection, so calling Stop() before that connection succeeds (e.g. when a peer renounces gossip leadership while the orderer is unreachable) caused a nil pointer dereference and panicked the peer. Add a nil check before calling blockReceiver.Stop() in both paths. Add regression tests that call Stop() before DeliverBlocks() is ever started to confirm no panic occurs. Signed-off-by: Ady0333 <adityashinde1525@gmail.com>
|
@Ady0333 Thank you for your hard work.
|
6f2ce0e to
807af94
Compare
Signed-off-by: Ady0333 <adityashinde1525@gmail.com>
807af94 to
c56431f
Compare
|
@pfi79 I will finish all my unfinished PRs. Here all the checks are passed now. Please review this pr once and let me know if any more changes are required... |
|
|
Thanks for the review!
The issue arises from a TOCTOU race where a member can be present in aliveLastTS but missing from id2Member due to a concurrent purge. In the original code, this leads to a nil pointer dereference when accessing member fields. I’ll update the test to better demonstrate the failure mode by clearly reproducing this inconsistent state and showing that the code would panic without the nil guard, while remaining stable with the fix. Please let me know if you'd prefer the test to explicitly assert the pre-fix panic behavior or focus on validating the invariant violation scenario. |
I have reviewed the code that you are editing. And it seems to me that you are mistaken., |
Type of change
Description
Fixed a nil pointer dereference that crashes the peer when
Stop()is called before a successful orderer connection.The issue occurs during normal gossip leader election when the orderer is unreachable. When a peer renounces leadership, it calls
StopDeliverForChannel()→Deliverer.Stop(), which unconditionally dereferencesd.blockReceiver.Stop(). However,blockReceiveris only initialized insideDeliverBlocks()after a successful gRPC connection to an orderer. IfStop()is called before that connection succeeds, the peer panics and crashes.I've added a nil check before calling
blockReceiver.Stop()in bothDeliverer.Stop()andBFTDeliverer.Stop(). The fix is safe becauseclose(d.DoneC)already signalsDeliverBlocks()to exit, and the mutex protects against races.I've also added regression tests that reproduce the crash scenario - calling
Stop()beforeDeliverBlocks()starts. These tests panic without the fix and pass with it.Additional details
This affects production networks during:
The crash is not a shutdown-only bug - it happens during normal operation when leadership flips while orderers are down.
Related issues
Fixes the peer crash during gossip leader election when orderers are unreachable.