Skip to content

Fix nil pointer panic in Deliverer.Stop() during leader election#5420

Open
Ady0333 wants to merge 2 commits into
hyperledger:mainfrom
Ady0333:fix/deliverer-stop-nil-blockReceiver
Open

Fix nil pointer panic in Deliverer.Stop() during leader election#5420
Ady0333 wants to merge 2 commits into
hyperledger:mainfrom
Ady0333:fix/deliverer-stop-nil-blockReceiver

Conversation

@Ady0333
Copy link
Copy Markdown
Contributor

@Ady0333 Ady0333 commented Mar 16, 2026

Type of change

  • Bug fix

Description

Fixed a nil pointer dereference that crashes the peer when Stop() is called before a successful orderer connection.

The issue occurs during normal gossip leader election when the orderer is unreachable. When a peer renounces leadership, it calls StopDeliverForChannel()Deliverer.Stop(), which unconditionally dereferences d.blockReceiver.Stop(). However, blockReceiver is only initialized inside DeliverBlocks() after a successful gRPC connection to an orderer. If Stop() is called before that connection succeeds, the peer panics and crashes.

I've added a nil check before calling blockReceiver.Stop() in both Deliverer.Stop() and BFTDeliverer.Stop(). The fix is safe because close(d.DoneC) already signals DeliverBlocks() to exit, and the mutex protects against races.

I've also added regression tests that reproduce the crash scenario - calling Stop() before DeliverBlocks() starts. These tests panic without the fix and pass with it.


Additional details

This affects production networks during:

  • Orderer maintenance windows
  • Network partitions
  • Rapid gossip leader election changes
  • Peers joining channels when orderers are unreachable

The crash is not a shutdown-only bug - it happens during normal operation when leadership flips while orderers are down.


Related issues

Fixes the peer crash during gossip leader election when orderers are unreachable.


Fixed a critical nil pointer dereference in the delivery service that caused peer crashes during gossip leader election when orderers were unreachable. The bug affected both CFT (etcdraft) and BFT consensus modes. When a peer renounced gossip leadership while unable to connect to orderers, the delivery service's Stop() method would panic, crashing the entire peer process and disrupting all channels on that peer.

Deliverer.Stop() and BFTDeliverer.Stop() unconditionally called
d.blockReceiver.Stop() without a nil guard. blockReceiver is only
set inside DeliverBlocks() after a successful orderer connection,
so calling Stop() before that connection succeeds (e.g. when a peer
renounces gossip leadership while the orderer is unreachable) caused
a nil pointer dereference and panicked the peer.

Add a nil check before calling blockReceiver.Stop() in both paths.
Add regression tests that call Stop() before DeliverBlocks() is
ever started to confirm no panic occurs.

Signed-off-by: Ady0333 <adityashinde1525@gmail.com>
@Ady0333 Ady0333 requested a review from a team as a code owner March 16, 2026 06:16
@Ady0333
Copy link
Copy Markdown
Contributor Author

Ady0333 commented Mar 16, 2026

@C0rWin and @pfi79 , Please review this pr once you have time! Happy to make any changes if needed.

@pfi79
Copy link
Copy Markdown
Contributor

pfi79 commented Mar 20, 2026

@Ady0333 Thank you for your hard work.

  1. You have unfinished PR. I hope you will not abandon them and bring them to the final.

  2. I don't remember needing a nil check here. As far as I remember, this code is written in such a way that it is called only if not nil. If I am wrong, then I ask you to convince me, for example, by writing an appropriate integration test.

@Ady0333 Ady0333 force-pushed the fix/deliverer-stop-nil-blockReceiver branch 2 times, most recently from 6f2ce0e to 807af94 Compare March 20, 2026 19:57
Signed-off-by: Ady0333 <adityashinde1525@gmail.com>
@Ady0333 Ady0333 force-pushed the fix/deliverer-stop-nil-blockReceiver branch from 807af94 to c56431f Compare March 22, 2026 04:57
@Ady0333
Copy link
Copy Markdown
Contributor Author

Ady0333 commented Mar 22, 2026

@pfi79 I will finish all my unfinished PRs. Here all the checks are passed now. Please review this pr once and let me know if any more changes are required...

@pfi79
Copy link
Copy Markdown
Contributor

pfi79 commented Mar 22, 2026

@pfi79 I will finish all my unfinished PRs. Here all the checks are passed now. Please review this pr once and let me know if any more changes are required...

  1. 3 files with the txt extension are not needed
  2. your tests show your fix, but they do not show that the existing fabric code is inoperable in the location you fixed. That is, I believe that so far no evidence of the correctness of your correction has been provided.

@Ady0333
Copy link
Copy Markdown
Contributor Author

Ady0333 commented Mar 22, 2026

@pfi79

Thanks for the review!

  1. I’ll remove the unnecessary .txt files from the PR.

  2. Regarding the test, You're right. The current test only verifies that the fix prevents a panic, but it does not explicitly demonstrate that the original implementation was unsafe.

The issue arises from a TOCTOU race where a member can be present in aliveLastTS but missing from id2Member due to a concurrent purge. In the original code, this leads to a nil pointer dereference when accessing member fields.

I’ll update the test to better demonstrate the failure mode by clearly reproducing this inconsistent state and showing that the code would panic without the nil guard, while remaining stable with the fix.

Please let me know if you'd prefer the test to explicitly assert the pre-fix panic behavior or focus on validating the invariant violation scenario.

@pfi79
Copy link
Copy Markdown
Contributor

pfi79 commented Mar 30, 2026

The issue arises from a TOCTOU race where a member can be present in aliveLastTS but missing from id2Member due to a concurrent purge. In the original code, this leads to a nil pointer dereference when accessing member fields.

I have reviewed the code that you are editing. And it seems to me that you are mistaken.,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants