Conversation
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
vicsn
left a comment
There was a problem hiding this comment.
1./2. I would try and optimize for simplicity of the implementation. With the foundations in place, in the future we can more formally balance robustness and performance under various scenario's.
3./4. Can you take a first stab in a google doc for the limits we choose? We can try to allocate, say, a rough 10GB at any given time for all peers combined. Assuming current average blocksize if needed.
| }; | ||
|
|
||
| #[derive(Clone, PartialEq, Eq, Hash)] | ||
| pub struct SyncToken([u8; 32]); |
There was a problem hiding this comment.
Can you document what this is and how it works? Are we exposed to worse MITM attacks compared to the Gateway handshake?
There was a problem hiding this comment.
The token is sent in plaintext and could be observed or tampered with by an inline MITM, yes. However, it is strictly an anti-DoS mechanism to prevent unauthorized resource consumption on the validator, not a cryptographic session key. The integrity of the downloaded blocks is guaranteed by the hashes and signatures, not by the transport layer, so any MITM tampering would be instantly caught and rejected.
| #[async_trait] | ||
| impl<N: Network> OnConnect for SyncStreams<N> { | ||
| async fn on_connect(&self, peer_addr: SocketAddr) { | ||
| // Check if we're the ones who provide the sync. |
There was a problem hiding this comment.
Can two peers sync from each other concurrently? Should we add a test for these edge cases?
There was a problem hiding this comment.
This behavior is controlled by the BlockSync logic, and remains unchanged. I don't see how such a scenario would be useful, so I'm pretty sure it is disallowed. As for the comment, this check is only required due to the modular nature of the Tcp plumbing - at the point of OnConnect::on_connect, we don't readily know the side of the connection (though this could be exposed by the Tcp if needed), so we can look up the applicable block request (which we would need anyway) in order to check it.
| // Retrieve the start (inclusive) and end (exclusive) block height. | ||
| let candidate_start_height = self.first().map(|b| b.height()).unwrap_or(0); | ||
| let candidate_end_height = 1 + self.last().map(|b| b.height()).unwrap_or(0); | ||
| // let candidate_start_height = self.first().map(|b| b.height()).unwrap_or(0); | ||
| // let candidate_end_height = 1 + self.last().map(|b| b.height()).unwrap_or(0); | ||
| // Check that the range matches the block request. | ||
| if start_height != candidate_start_height || end_height != candidate_end_height { | ||
| bail!("Peer '{peer_ip}' sent an invalid block response (range does not match block request)") | ||
| } | ||
| // if start_height != candidate_start_height || end_height != candidate_end_height { | ||
| // bail!("Peer '{peer_ip}' sent an invalid block response (range does not match block request)") | ||
| // } |
There was a problem hiding this comment.
if we are to reuse the existing BlockRequest, this check is no longer correct, as the request can span a much greater range of blocks than the responses
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
This PR is a draft implementation of dedicated sync streams listed in the node hardening issue, as an alternative to the general chunking proposal. Initially, it applies only to the validators, which will benefit from it the most.
The main reasons for the proposal are:
Benefits over chunking:
The general approach is as follows: the
Gateway'sEventgains 2 new variants (in order to temporarily maintain backward compatibility via the existingBlockRequestandBlockResponse),SyncRequestandSyncResponse. When a node receives aSyncRequest, it responds with an address to a dedicated sync stream, and a short-lived access token that must be used in order to establish the connection. Once established, the responder sendsBlockResponsemessages (through the dedicated stream), and the existingBlockSyncplumbing handles the rest. Once the maximum number of responses (if such a limit is desired) per a sync stream is sent, a new sync stream needs to be opened in order to receive more blocks.The rough list of code changes, enumerated for simpler referencing if need be:
snarkos-node-networkto avoid circular dependencies (it will also make sense for the currentlyGateway-only messages to reside there once this syncing is extended to non-validators).SyncTokenandSyncResponse, are introduced.Eventis extended with 2 new variants,SyncRequest(holding aBlockRequest) andSyncResponse(which holds the newSyncResponsestruct).SyncStreamsobject is introduced; it is essentially a node that requires no special peer handling or address resolution, and has a trivial handshake. Just like theGateway, it contains a clone of sync-relatedSenders, and theLedgerService. Using a node makes stream handling a lot simpler, and - compared to ad-hoc streams - reduces potential NAT/firewall issues (since only a single listener port is involved). TheTcpnode is very lightweight, and so are its connections.BlockSyncplumbing is adjusted to account for the new logic.The current state of the PR: nodes can successfully establish dedicated sync streams and send/receive blocks, but I'm running into design details of the current
BlockSyncsetup that are incompatible with the new approach; the problematic spots can be seen commented out inblock_sync.rs. The issues I've identified thus far are:BlockLocators(as opposed to the dedicated sync streams)BlockResponse(or even justBlock) messages to single peers via a single stream; the expected block ranges should also be much larger, so as to minimize the number ofSyncRequestmessages that need to be handled by theGatewayOnce these are resolved, the related tests will also need some adjustment.
@kaimast please let me know if you have ideas on how the aforementioned
BlockSyncintegration issues can be solved while maintaining backward compatibility, or suggest how this logic could be delegated elsewhere; feel free to commit to this PR if you'd like to integrate these changes withBlockSyncin a way that's aligned with your design and future plans.Open questions:
consts.