Skip to content

[Fix] Make the validators dump their block tree correctly#4234

Open
ljedrz wants to merge 5 commits intostagingfrom
fix/validator_block_tree_dump
Open

[Fix] Make the validators dump their block tree correctly#4234
ljedrz wants to merge 5 commits intostagingfrom
fix/validator_block_tree_dump

Conversation

@ljedrz
Copy link
Copy Markdown
Collaborator

@ljedrz ljedrz commented Apr 27, 2026

The Drop impl of the Ledger is what causes the block tree to be cached on shutdown. As long as there are any "zombie tasks" which hold a reference to the ledger post-shutdown, that impl won't fire, forcing the node to fully rebuild the block tree on startup (or to report an invalid cached block tree if it's old, as it was reported in the linked issue).

This was debugged using tokio-console. I've also considered downgrading some of the Ledger-related Arcs to Weak, but sadly, that would result in a Resultpocalypse. That being said, it is perfectly fine to use Arc everywhere (like we do now), as long as we make sure that the persistent tasks that own them are terminated during shutdown.

Fixes #4227.

Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
@ljedrz ljedrz requested review from kaimast and vicsn April 27, 2026 14:03
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Comment thread node/sync/src/ping.rs
ljedrz added 3 commits April 28, 2026 13:04
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
@ljedrz
Copy link
Copy Markdown
Collaborator Author

ljedrz commented Apr 28, 2026

I've updated the PR - now no zombie tasks are expected to exist at shutdown, and if any are found, an ERROR log is emitted.

Copy link
Copy Markdown
Collaborator

@vicsn vicsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

Comment thread node/rest/src/lib.rs

/// Shuts down the REST instance.
pub fn shut_down(&self) {
self.handles.lock().iter().for_each(|handle| handle.abort());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter as much but using drain(..) would also drop the handles when calling shut down.

Comment thread node/sync/src/ping.rs
/// Wake up the ping task so that its inner loop iterates and breaks.
/// It should only be called after the ledger has already been stopped.
pub fn stop(&self) {
self.notify.notify_one();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just abort the ping task here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] The stored height is different than the one in the block tree

3 participants