Skip to content

[Feature] Batch Triggers#4190

Open
kaimast wants to merge 24 commits intostagingfrom
feat/batch-triggers
Open

[Feature] Batch Triggers#4190
kaimast wants to merge 24 commits intostagingfrom
feat/batch-triggers

Conversation

@kaimast
Copy link
Copy Markdown
Contributor

@kaimast kaimast commented Mar 30, 2026

This PR changes the batch creation task, to trigger on certain events, instead of only periodically. The goal is to avoid cases where a node is not ready to propose and then waits another MIN_BATCH_DELAY (1s) instead of only for the relevant conditions. It also ensures that the batch delay timer is reset whenever the node advances.

The most important changes in this PR are in Primary::start_handlers and the new proposal_task.rs module. The README details when the batch creation task is woken up.

Overview of Changes

The major change of this PR is to only have proposals occur in a single location -- the proposal task. It gets woken up on specific events, such as timer expiration or receiving sufficient certificates for a round.
Unlike before, where it woke up every second and retired independent of round advancement, it now correctly resets timers after advancing rounds.

Minor Changes

  • 6167485 changes some of the time constants to use Duration. This avoids the potential for incorrect conversion of these constants
  • cefe21b moves the proposal logic to a dedicated proposal_task module. This enables unit testing its behavior and when it will be woken up.

Testing

End-to-end stress tests passed:
Screenshot 2026-03-30 at 5 08 16 PM

@kaimast kaimast force-pushed the feat/batch-triggers branch from 0f2041b to 84e73d5 Compare March 30, 2026 04:30
@kaimast kaimast changed the base branch from staging to chore/update-rand March 30, 2026 22:58
@kaimast kaimast force-pushed the feat/batch-triggers branch from 40ec68f to a51872b Compare March 30, 2026 23:02
@kaimast kaimast force-pushed the chore/update-rand branch from e1694dc to 91692c7 Compare March 30, 2026 23:52
@kaimast kaimast force-pushed the feat/batch-triggers branch from b20642d to a3177c4 Compare March 30, 2026 23:55
Base automatically changed from chore/update-rand to staging March 31, 2026 12:03
@kaimast kaimast marked this pull request as ready for review March 31, 2026 19:39
Copy link
Copy Markdown
Collaborator

@vicsn vicsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, do you see a lower average blocktime in your stress tests or in the CI devnets?

let mut state = self.sync_state.write();
state.set_sync_height(new_height);
!state.can_issue_new_block_requests()
(state.is_block_synced(), !state.can_issue_new_block_requests())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have asked this before, but when would we ever be synced but still able to issue new block requests?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are one block behind you are still considered "synced" but could also request one more block.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write or link up some docblocks or README paragraph giving examples with specific block heights, for either validators and clients, for how the two concepts evolve?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping

Comment thread node/bft/src/lib.rs Outdated
Comment thread node/bft/src/sync/mod.rs
Comment thread node/consensus/src/lib.rs
Comment thread node/metrics/src/names.rs
Comment thread node/bft/src/lib.rs Outdated
pub const MAX_FETCH_TIMEOUT: Duration = Duration::from_secs(3 * MAX_BATCH_DELAY.as_secs());
/// The maximum time allowed for the leader to send their certificate.
/// After this time, the node will consider the leader as failed and try to advance the round without it.
pub const MAX_LEADER_CERTIFICATE_DELAY: Duration = Duration::from_secs(2 * MAX_BATCH_DELAY.as_secs());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this value is the same as before. This is now 4 seconds (2 * 2 = 4). Whereas it was 5 seconds before (2 * 2500 / 1000 = 5000 / 1000 = 5) .

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIxed in 942daae. I also added a note to change it to simply = 2 * MAX_BATCH_DELAY once const_trait_impl is stable.

Comment thread node/bft/src/lib.rs Outdated
pub const CREATE_BATCH_INTERVAL: Duration = Duration::from_millis(250);

/// The maximum time to wait before timing out on a fetch.
pub const MAX_FETCH_TIMEOUT: Duration = Duration::from_secs(3 * MAX_BATCH_DELAY.as_secs());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same truncation issue here

Comment thread node/bft/src/primary.rs
ledger: Arc<dyn LedgerService<N>>,
/// The workers.
workers: Arc<OnceLock<Vec<Worker<N>>>>,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why the new lines here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes the multi-line doc comments easier to read IMHO, but I can revert it.

Comment thread node/bft/src/primary/proposal_task.rs Outdated
Copy link
Copy Markdown
Collaborator

@ljedrz ljedrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, though it is a non-negligible increase in the amount of logic; it would be great if in the near future we could generalize the "trigger X if Y or Z happens or a timeout occurs" logic across our use cases.

@kaimast
Copy link
Copy Markdown
Contributor Author

kaimast commented Apr 9, 2026

Do you know what kind of speed-ups we should expect in consensus? Or if there is meaningful impact at all to block time?

I am still trying to figure out a straightforward way of measuring this. My expectation is that the average block time stays about the same but the tail end of block times will be lower.

@vicsn
Copy link
Copy Markdown
Collaborator

vicsn commented Apr 14, 2026

Forwarding from @kaimast , this PR introduces a significant improvement in commit latency for light networks (5 validators, weak machines, see image below), and at least no regression for normal prerelease runs.

Screenshot 2026-04-13 at 10 47 50 PM

@vicsn vicsn requested a review from ljedrz April 14, 2026 11:14
Comment thread node/bft/src/primary/proposal_task.rs Outdated
}
}

attempt += 1;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we or should we ever decrement attempt?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine, it gets reset with every iteration of the outer loop, once the round advances or a proposal is cut

Comment thread node/bft/src/primary/proposal_task.rs Outdated
}
};

if attempt > 1 {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I reading correctly that when attempt > 1, we first sleep for at least MIN_BATCH_DELAY, and then again for at least CREATE_BATCH_INTERVAL? That seems like too much. Or did you intend for this to be sleep_until(round_start + CREATE_BATCH_INTERVAL) or something?

Maybe you can document the role of attempt better

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second and subsequent calls to sleep should return immediately, because round_start is only reset when the round number changes.

Comment thread node/bft/src/primary/proposal_task.rs Outdated
Comment thread node/bft/src/primary.rs Outdated
Comment thread node/consensus/src/lib.rs Outdated
Comment thread node/bft/src/primary/proposal_task.rs Outdated
Copy link
Copy Markdown
Collaborator

@ljedrz ljedrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a new round of comments.

// Final heap use should be close to that after the first connection.
// We allow up to 1 KiB of growth to accommodate small per-task allocations that
// tokio's runtime retains internally across connections (e.g. sharded queue state).
let heap_growth = heap_after_loop.saturating_sub(heap_after_one_conn.unwrap());
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this test failed and got the following explanation by Claude:

The ~700-byte heap growth across 9 extra connect/disconnect cycles comes from Data::deserialize() calling tokio::task::spawn_blocking() during signature
verification in the handshake (verify_challenge_response). When a signature arrives from the network it is a Data::Buffer(bytes) variant, so snarkvm
offloads the cryptographic deserialization to a blocking thread.

Each full connect/disconnect cycle triggers this twice (once on each side of the handshake), so 9 extra cycles = 18 spawn_blocking calls beyond the baseline.

In tokio 1.52.0 (upgraded from 1.51.1 by the cargo update on this branch), PR #7757 replaced the blocking pool's injection queue with a sharded structure o improve scalability. The sharded queue retains a small amount of internal bookkeeping per submitted task even after the task completes — roughly 39 bytes per call × 18 calls ≈ 700 bytes. All of our application-level data structures (peer_pool, resolver, cache, conn.tasks, tcp.tasks) are fully cleaned up after each cycle; the residual allocation is entirely inside tokio's runtime internals.

The 1 KiB assertion bound accounts for this overhead while still being tight enough to catch any real per-connection leak (even accumulating a small struct across 9 connections would far exceed 1 KiB).

@lukasz the fix makes sense to me, but maybe you can also take quick look

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is perfectly in line with my past experiences with tokio, where some allocated memory is retained due to plumbing: tokio-rs/tokio#4031

@kaimast
Copy link
Copy Markdown
Contributor Author

kaimast commented Apr 15, 2026

I think I addressed all the comments. However, the most recent stresstest run for this branch was not successful, and I need to address that first before we can merge the branch.

ljedrz
ljedrz previously approved these changes Apr 16, 2026
Copy link
Copy Markdown
Collaborator

@ljedrz ljedrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the stresstest issue resolved.

raychu86
raychu86 previously approved these changes Apr 21, 2026
@kaimast kaimast dismissed stale reviews from raychu86 and ljedrz via f1af61f April 22, 2026 03:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants