feat(network): offload packet codec to an async strand pool#412
Draft
wu-vincent wants to merge 12 commits into
Draft
feat(network): offload packet codec to an async strand pool#412wu-vincent wants to merge 12 commits into
wu-vincent wants to merge 12 commits into
Conversation
Splice an ABI-compatible AsyncBatchedNetworkPeer in place of BDS's
BatchedNetworkPeer (via an onNewIncomingConnection hook), moving per-connection
decrypt/decompress/compress/encrypt onto a boost::asio strand pool once the
connection authenticates. The peer derives from a layout-faithful
BatchedNetworkPeer reconstruction so BDS's enableAsyncFlush writes the inherited
mAsyncEnabled flag, which gates activation: the whole handshake stays
synchronous on the main thread, and PacketSendEvent/PacketReceiveEvent always
fire on the main thread. The old BatchedNetworkPeer send/recv hooks are removed
(their logic now lives as virtual overrides in AsyncBatchedNetworkPeer).
WIP: the onNewIncomingConnection symbol offset is a placeholder in
src/bedrock/symbols/{windows,linux}.h -- regenerate via
'dump_symbols.py --pdb' before running. The inbound queue is a mutexed
std::queue (lock-free SPSC TODO) and the worker count is not yet configurable.
Reworks the async batched peer (#356) based on review feedback: - Reconstruct the full BatchedNetworkPeer layout (real typed members instead of opaque padding), reusing the existing SPSCQueue for both the inherited send queue and the inbound sub-packet queue (drops the mutex + std::queue). - Fold the per-connection async state into AsyncBatchedNetworkPeer via enable_shared_from_this so in-flight strand tasks keep the peer alive on teardown (no separate state object). - Use one strand per connection for both directions (Netty EventLoop model) instead of separate recv/send strands. - Replace NetworkThreadPool with a RAII EventLoopGroup (Netty naming: the pool is an EventLoopGroup, a per-connection strand is an EventLoop obtained via next()). EndstoneServer owns it through a unique_ptr; ctor starts the worker threads, dtor joins them. The flush completion callback runs inline on the main thread, removing the main-thread completion queue.
… overrides on demand Reimplement BatchedNetworkPeer as a self-contained, BDS-faithful peer (sendPacket/flush/update/_receivePacket + the trivial passthroughs) in its own batched_network_peer.cpp. AsyncBatchedNetworkPeer now subclasses it and overrides only the send/receive paths, calling the base (super) for the synchronous behaviour and reusing the inherited codec scratch members. - drop the invented splitNext/flushSync helpers and the duplicated outgoing_data_/incoming_data_*/compressible_bytes_ fields - remove the redundant closing_ flag (shared_from_this already gates teardown) - the event loop pulls decoded batches via the base _receivePacket; the main thread splits + dispatches events, mirroring BDS - regenerate the Windows symbol table from the 1.26.20.5 PDB (resolves onNewIncomingConnection)
…ction hook Move the BatchedNetworkPeer -> AsyncBatchedNetworkPeer splice out of a static AsyncBatchedNetworkPeer::splice() and directly into NetworkSystem's hook, where the connection and its peer chain already live. This lets NetworkPeer drop `friend endstone::core::AsyncBatchedNetworkPeer` (and the cross-layer forward declaration of an endstone::core type in the bedrock header); NetworkSystem -- already in the bedrock layer -- becomes the friend instead. AsyncBatchedNetworkPeer no longer reaches into bedrock internals.
The async strand decoded inbound packets by calling RakNetNetworkPeer::_receivePacket on a worker thread, racing newData() on the main thread over RakNet's unsynchronized per-connection read buffer. Insert a BufferedRakNetPeer at the bottom of an async connection's peer chain: its update() runs on the main thread (the per-tick update chain) and drains RakNet's read buffer into an SPSC queue; its _receivePacket() runs on the strand during decrypt/decompress and just pops the queue. The inner-chain update() now runs on the main thread so the drain lands there, mirroring BDS's own async-send concurrency model. RakNet's buffer is therefore only ever touched on the main thread. Also give NetworkPeer default chain-forwarding implementations for the pass-through virtuals so wrapper peers need not reimplement them.
Gate the async packet codec behind `[network] async` in endstone.toml, with `[network] threads` sizing the worker pool (0 = automatic). The EventLoopGroup is only created when async is enabled, and a warning is logged at startup noting the feature is experimental. When disabled, onNewIncomingConnection leaves BDS's synchronous peer chain untouched (zero overhead).
Add function-prologue byte patterns for NetworkSystem::onNewIncomingConnection to the Windows and Linux symbol configs, and regenerate linux.h. The Linux RVA was a placeholder 0 (the hook was a no-op); it now resolves to its real offset.
Use BEDROCK_STATIC_ASSERT_SIZE(BatchedNetworkPeer, 344, 320) instead of a Windows-only static_assert, which would fail the Linux build (libc++ shrinks the std::string and binary-stream members by 24 bytes). Pull in bedrock.h for the macro.
) The packet send/receive events live in AsyncBatchedNetworkPeer, but the splice bailed out early when [network] async was off, leaving stock BatchedNetworkPeer in place so PacketSendEvent/PacketReceiveEvent never fired. Always splice AsyncBatchedNetworkPeer so the events fire regardless; make its event loop optional and only set up the BufferedRakNetPeer + strand when async is enabled. With no event loop it never activates and stays a synchronous main-thread passthrough.
…356) Inline the send/receive event handling into sendPacket/_receivePacket and remove the std::optional<std::string> return contract. The unmodified path no longer copies the whole packet to thread it through an optional: send forwards the original `data` straight to the batch, and receive moves `raw` into out_data. Only a plugin-modified payload allocates a new buffer.
Nest the activation transition and the sync-passthrough divert under a single `!activated_` check instead of two sequential ones. No behavior change: once activated this tick or a prior one, control falls through to the async flush, recv scheduling and inner peer update.
Replace the shared io_context + per-connection strand with N single-threaded io_contexts (one std::thread each). A connection now binds to one event loop for its lifetime via EventLoopGroup::next(), so its codec work is serialized AND thread-affine -- the cipher/compression/buffer state stays hot on one core, with no strand dispatch hand-off. Modeled on Netty's EventLoopGroup.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #356.
What this does
Splices an ABI-compatible
AsyncBatchedNetworkPeerinto each connection's peer chain in place of BDS'sBatchedNetworkPeer(via anonNewIncomingConnectionhook), moving that connection's decrypt/decompress/compress/encrypt onto aboost::asioworker pool once the connection authenticates. Modelled on Netty: the pool is anEventLoopGroup, and each connection binds to oneEventLoop(a serialized strand) that handles both directions.It derives from a full layout reconstruction of
BatchedNetworkPeer(344 bytes,mAsyncEnabledat +0x150) so BDS'senableAsyncFlushwrites the inherited flag — which gates activation: the whole handshake (encryption/compression setup) stays synchronous on the main thread, so there's no async window to race.PacketSendEvent/PacketReceiveEventand packet patching always fire on the main thread; only the inner-chain codec runs on the event loop.Thread-safe receive (
BufferedRakNetPeer)The inner chain ends at BDS's
RakNetNetworkPeer, which keeps each connection's received packets in an unsynchronized per-peer buffer (mReadBufferDatas), filled byRakNetConnector::runEvents → newData()on the main thread. Decoding inbound packets on the strand would callRakNetNetworkPeer::_receivePacketfrom a worker and race that buffer.To keep RakNet's buffer main-thread-only, a lightweight
BufferedRakNetPeeris spliced at the bottom of the chain, wrapping theRakNetNetworkPeer:update()runs on the main thread (the per-tick update chain — the same thread asnewData) and drains RakNet's buffer into a single-producer/single-consumer queue;_receivePacket()runs on the strand during decrypt/decompress and just pops that queue.So the strand-side codec never touches RakNet's buffer. The inner-chain
update()runs on the main thread (it forwards, does RakNet telemetry, and the drain) — mirroring BDS's own async model, where the codecsendPacketruns on a worker whileupdate()runs on main.Data flow
sendPacketfiresPacketSendEvent+ patching and batches on the main thread;flush()extracts the batch and postspeer_->sendPacket(compress → encrypt → RakNet) to the strand.BufferedRakNetPeer::update()drains RakNet on the main thread into its SPSC queue; arecvLoopon the strand pulls from it through decrypt/decompress into a second SPSC queue; the main-thread_sortAndPacketizeEventspops the decoded queue and firesPacketReceiveEventin arrival order.NetworkPeergains default chain-forwarding implementations for the pass-through virtuals so wrapper peers (BufferedRakNetPeer) need not reimplement them.AsyncBatchedNetworkPeerusesenable_shared_from_thisso an in-flight task keeps the peer alive across teardown. TheEventLoopGroupis RAII, owned byEndstoneServerviaunique_ptr(ctor starts the workers, dtor joins them).Builds and links cleanly on Windows (
endstone_runtime.dll); the layoutstatic_assertholds, and the server accepts connections / gameplay on 1.26.20.5.Thread-safety summary
Every piece of mutable state has a single owning lane, crossing lanes only through the two SPSC queues:
mReadBufferDatas): main only (newData+BufferedRakNetPeer::updatepoll).PacketSend/ReceiveEvent, packet patching): main only.activated_: main only (non-atomic by design);recv_scheduled_: atomic, single-in-flightrecvLoopguard.Verified against BDS 1.26.10.4
update()does no send-side work.CompressedNetworkPeer/EncryptedNetworkPeerdon't overrideupdate()/flush(); they inheritNetworkPeer::update(), which is literallyif (mPeer) mPeer->update();. So running the inner-chainupdate()on the main thread only forwards (plus theBufferedRakNetPeerdrain + RakNet telemetry) and never touches codec state — making it safe alongside the strand codec. (This is also why the forwarding defaults added toNetworkPeermatch BDS exactly.)~RakNetNetworkPeerfrees its read/send buffers (noRakPeerclose, no connector deregistration),~EncryptedNetworkPeerfrees its cipher/HMAC state, etc. So when the lastshared_from_this()ref drops on a worker thread at teardown, destroying the chain there is fine.Remaining / follow-ups
[network]worker-thread-count config knob (currently auto =cpu - 2).🤖 Generated with Claude Code