Conversation
0f2041b to
84e73d5
Compare
40ec68f to
a51872b
Compare
e1694dc to
91692c7
Compare
b20642d to
a3177c4
Compare
…w block is committeed
Add a `commit_lock` (async mutex) to serialize concurrent calls to `commit_leader_certificate`. Without it, a second certificate crossing the availability threshold while the consensus callback for a prior commit is still in-flight would re-walk already-committed rounds (because `last_committed_round` hasn't been updated yet), causing duplicate subdag commits.
a3177c4 to
da71e92
Compare
vicsn
left a comment
There was a problem hiding this comment.
Cool, do you see a lower average blocktime in your stress tests or in the CI devnets?
| let mut state = self.sync_state.write(); | ||
| state.set_sync_height(new_height); | ||
| !state.can_issue_new_block_requests() | ||
| (state.is_block_synced(), !state.can_issue_new_block_requests()) |
There was a problem hiding this comment.
I may have asked this before, but when would we ever be synced but still able to issue new block requests?
There was a problem hiding this comment.
If you are one block behind you are still considered "synced" but could also request one more block.
There was a problem hiding this comment.
Can you write or link up some docblocks or README paragraph giving examples with specific block heights, for either validators and clients, for how the two concepts evolve?
| pub const MAX_FETCH_TIMEOUT: Duration = Duration::from_secs(3 * MAX_BATCH_DELAY.as_secs()); | ||
| /// The maximum time allowed for the leader to send their certificate. | ||
| /// After this time, the node will consider the leader as failed and try to advance the round without it. | ||
| pub const MAX_LEADER_CERTIFICATE_DELAY: Duration = Duration::from_secs(2 * MAX_BATCH_DELAY.as_secs()); |
There was a problem hiding this comment.
I don't believe this value is the same as before. This is now 4 seconds (2 * 2 = 4). Whereas it was 5 seconds before (2 * 2500 / 1000 = 5000 / 1000 = 5) .
There was a problem hiding this comment.
FIxed in 942daae. I also added a note to change it to simply = 2 * MAX_BATCH_DELAY once const_trait_impl is stable.
| pub const CREATE_BATCH_INTERVAL: Duration = Duration::from_millis(250); | ||
|
|
||
| /// The maximum time to wait before timing out on a fetch. | ||
| pub const MAX_FETCH_TIMEOUT: Duration = Duration::from_secs(3 * MAX_BATCH_DELAY.as_secs()); |
There was a problem hiding this comment.
Same truncation issue here
| ledger: Arc<dyn LedgerService<N>>, | ||
| /// The workers. | ||
| workers: Arc<OnceLock<Vec<Worker<N>>>>, | ||
|
|
There was a problem hiding this comment.
nit: why the new lines here?
There was a problem hiding this comment.
It makes the multi-line doc comments easier to read IMHO, but I can revert it.
ljedrz
left a comment
There was a problem hiding this comment.
LGTM in general, though it is a non-negligible increase in the amount of logic; it would be great if in the near future we could generalize the "trigger X if Y or Z happens or a timeout occurs" logic across our use cases.
I am still trying to figure out a straightforward way of measuring this. My expectation is that the average block time stays about the same but the tail end of block times will be lower. |
|
Forwarding from @kaimast , this PR introduces a significant improvement in commit latency for light networks (5 validators, weak machines, see image below), and at least no regression for normal prerelease runs.
|
| } | ||
| } | ||
|
|
||
| attempt += 1; |
There was a problem hiding this comment.
Do we or should we ever decrement attempt?
There was a problem hiding this comment.
this is fine, it gets reset with every iteration of the outer loop, once the round advances or a proposal is cut
| } | ||
| }; | ||
|
|
||
| if attempt > 1 { |
There was a problem hiding this comment.
Am I reading correctly that when attempt > 1, we first sleep for at least MIN_BATCH_DELAY, and then again for at least CREATE_BATCH_INTERVAL? That seems like too much. Or did you intend for this to be sleep_until(round_start + CREATE_BATCH_INTERVAL) or something?
Maybe you can document the role of attempt better
There was a problem hiding this comment.
The second and subsequent calls to sleep should return immediately, because round_start is only reset when the round number changes.
ljedrz
left a comment
There was a problem hiding this comment.
Left a new round of comments.
| // Final heap use should be close to that after the first connection. | ||
| // We allow up to 1 KiB of growth to accommodate small per-task allocations that | ||
| // tokio's runtime retains internally across connections (e.g. sharded queue state). | ||
| let heap_growth = heap_after_loop.saturating_sub(heap_after_one_conn.unwrap()); |
There was a problem hiding this comment.
I had this test failed and got the following explanation by Claude:
The ~700-byte heap growth across 9 extra connect/disconnect cycles comes from
Data::deserialize()callingtokio::task::spawn_blocking()during signature
verification in the handshake (verify_challenge_response). When a signature arrives from the network it is aData::Buffer(bytes)variant, so snarkvm
offloads the cryptographic deserialization to a blocking thread.Each full connect/disconnect cycle triggers this twice (once on each side of the handshake), so 9 extra cycles = 18
spawn_blockingcalls beyond the baseline.In tokio 1.52.0 (upgraded from 1.51.1 by the
cargo updateon this branch), PR #7757 replaced the blocking pool's injection queue with a sharded structure o improve scalability. The sharded queue retains a small amount of internal bookkeeping per submitted task even after the task completes — roughly 39 bytes per call × 18 calls ≈ 700 bytes. All of our application-level data structures (peer_pool, resolver, cache, conn.tasks, tcp.tasks) are fully cleaned up after each cycle; the residual allocation is entirely inside tokio's runtime internals.The 1 KiB assertion bound accounts for this overhead while still being tight enough to catch any real per-connection leak (even accumulating a small struct across 9 connections would far exceed 1 KiB).
@lukasz the fix makes sense to me, but maybe you can also take quick look
There was a problem hiding this comment.
this is perfectly in line with my past experiences with tokio, where some allocated memory is retained due to plumbing: tokio-rs/tokio#4031
|
I think I addressed all the comments. However, the most recent stresstest run for this branch was not successful, and I need to address that first before we can merge the branch. |
ljedrz
left a comment
There was a problem hiding this comment.
LGTM with the stresstest issue resolved.

This PR changes the batch creation task, to trigger on certain events, instead of only periodically. The goal is to avoid cases where a node is not ready to propose and then waits another MIN_BATCH_DELAY (1s) instead of only for the relevant conditions. It also ensures that the batch delay timer is reset whenever the node advances.
The most important changes in this PR are in
Primary::start_handlersand the newproposal_task.rsmodule. The README details when the batch creation task is woken up.Overview of Changes
The major change of this PR is to only have proposals occur in a single location -- the proposal task. It gets woken up on specific events, such as timer expiration or receiving sufficient certificates for a round.
Unlike before, where it woke up every second and retired independent of round advancement, it now correctly resets timers after advancing rounds.
Minor Changes
Duration. This avoids the potential for incorrect conversion of these constantsproposal_taskmodule. This enables unit testing its behavior and when it will be woken up.Testing
End-to-end stress tests passed:
