feat(network): offload packet codec to an async strand pool by wu-vincent · Pull Request #412 · EndstoneMC/endstone

wu-vincent · 2026-06-02T22:53:22Z

Closes #356.

What this does

Splices an ABI-compatible AsyncBatchedNetworkPeer into each connection's peer chain in place of BDS's BatchedNetworkPeer (via an onNewIncomingConnection hook), moving that connection's decrypt/decompress/compress/encrypt onto a boost::asio worker pool once the connection authenticates. Modelled on Netty: the pool is an EventLoopGroup, and each connection binds to one EventLoop (a serialized strand) that handles both directions.

It derives from a full layout reconstruction of BatchedNetworkPeer (344 bytes, mAsyncEnabled at +0x150) so BDS's enableAsyncFlush writes the inherited flag — which gates activation: the whole handshake (encryption/compression setup) stays synchronous on the main thread, so there's no async window to race. PacketSendEvent/PacketReceiveEvent and packet patching always fire on the main thread; only the inner-chain codec runs on the event loop.

Thread-safe receive (`BufferedRakNetPeer`)

The inner chain ends at BDS's RakNetNetworkPeer, which keeps each connection's received packets in an unsynchronized per-peer buffer (mReadBufferDatas), filled by RakNetConnector::runEvents → newData() on the main thread. Decoding inbound packets on the strand would call RakNetNetworkPeer::_receivePacket from a worker and race that buffer.

To keep RakNet's buffer main-thread-only, a lightweight BufferedRakNetPeer is spliced at the bottom of the chain, wrapping the RakNetNetworkPeer:

its update() runs on the main thread (the per-tick update chain — the same thread as newData) and drains RakNet's buffer into a single-producer/single-consumer queue;
its _receivePacket() runs on the strand during decrypt/decompress and just pops that queue.

So the strand-side codec never touches RakNet's buffer. The inner-chain update() runs on the main thread (it forwards, does RakNet telemetry, and the drain) — mirroring BDS's own async model, where the codec sendPacket runs on a worker while update() runs on main.

Data flow

Send (main → strand): sendPacket fires PacketSendEvent + patching and batches on the main thread; flush() extracts the batch and posts peer_->sendPacket (compress → encrypt → RakNet) to the strand.
Receive (main → strand → main): BufferedRakNetPeer::update() drains RakNet on the main thread into its SPSC queue; a recvLoop on the strand pulls from it through decrypt/decompress into a second SPSC queue; the main-thread _sortAndPacketizeEvents pops the decoded queue and fires PacketReceiveEvent in arrival order.

NetworkPeer gains default chain-forwarding implementations for the pass-through virtuals so wrapper peers (BufferedRakNetPeer) need not reimplement them. AsyncBatchedNetworkPeer uses enable_shared_from_this so an in-flight task keeps the peer alive across teardown. The EventLoopGroup is RAII, owned by EndstoneServer via unique_ptr (ctor starts the workers, dtor joins them).

Builds and links cleanly on Windows (endstone_runtime.dll); the layout static_assert holds, and the server accepts connections / gameplay on 1.26.20.5.

Thread-safety summary

Every piece of mutable state has a single owning lane, crossing lanes only through the two SPSC queues:

RakNet read buffer (mReadBufferDatas): main only (newData + BufferedRakNetPeer::update poll).
Codec (compress / decompress / encrypt / decrypt): strand only, serialized per connection (a connection's cipher/HMAC chain is inherently serial).
Plugin events (PacketSend/ReceiveEvent, packet patching): main only.
Two SPSC queues (raw-from-RakNet, decoded-to-main): one producer / one consumer each; the strand serializes its side and provides happens-before between handlers even though it hops worker threads.
Send batch / Batched decode buffers: main-only / strand-only, with a clean handoff at activation.
activated_: main only (non-atomic by design); recv_scheduled_: atomic, single-in-flight recvLoop guard.

Verified against BDS 1.26.10.4

Codec update() does no send-side work. CompressedNetworkPeer/EncryptedNetworkPeer don't override update()/flush(); they inherit NetworkPeer::update(), which is literally if (mPeer) mPeer->update();. So running the inner-chain update() on the main thread only forwards (plus the BufferedRakNetPeer drain + RakNet telemetry) and never touches codec state — making it safe alongside the strand codec. (This is also why the forwarding defaults added to NetworkPeer match BDS exactly.)
Off-main teardown is safe. The inner-chain destructors are pure memory cleanup + atomic refcount ops: ~RakNetNetworkPeer frees its read/send buffers (no RakPeer close, no connector deregistration), ~EncryptedNetworkPeer frees its cipher/HMAC state, etc. So when the last shared_from_this() ref drops on a worker thread at teardown, destroying the chain there is fine.

Remaining / follow-ups

[network] worker-thread-count config knob (currently auto = cpu - 2).
50+ connection load test (ordering / teardown validation).

🤖 Generated with Claude Code

Splice an ABI-compatible AsyncBatchedNetworkPeer in place of BDS's BatchedNetworkPeer (via an onNewIncomingConnection hook), moving per-connection decrypt/decompress/compress/encrypt onto a boost::asio strand pool once the connection authenticates. The peer derives from a layout-faithful BatchedNetworkPeer reconstruction so BDS's enableAsyncFlush writes the inherited mAsyncEnabled flag, which gates activation: the whole handshake stays synchronous on the main thread, and PacketSendEvent/PacketReceiveEvent always fire on the main thread. The old BatchedNetworkPeer send/recv hooks are removed (their logic now lives as virtual overrides in AsyncBatchedNetworkPeer). WIP: the onNewIncomingConnection symbol offset is a placeholder in src/bedrock/symbols/{windows,linux}.h -- regenerate via 'dump_symbols.py --pdb' before running. The inbound queue is a mutexed std::queue (lock-free SPSC TODO) and the worker count is not yet configurable.

Reworks the async batched peer (#356) based on review feedback: - Reconstruct the full BatchedNetworkPeer layout (real typed members instead of opaque padding), reusing the existing SPSCQueue for both the inherited send queue and the inbound sub-packet queue (drops the mutex + std::queue). - Fold the per-connection async state into AsyncBatchedNetworkPeer via enable_shared_from_this so in-flight strand tasks keep the peer alive on teardown (no separate state object). - Use one strand per connection for both directions (Netty EventLoop model) instead of separate recv/send strands. - Replace NetworkThreadPool with a RAII EventLoopGroup (Netty naming: the pool is an EventLoopGroup, a per-connection strand is an EventLoop obtained via next()). EndstoneServer owns it through a unique_ptr; ctor starts the worker threads, dtor joins them. The flush completion callback runs inline on the main thread, removing the main-thread completion queue.

… overrides on demand Reimplement BatchedNetworkPeer as a self-contained, BDS-faithful peer (sendPacket/flush/update/_receivePacket + the trivial passthroughs) in its own batched_network_peer.cpp. AsyncBatchedNetworkPeer now subclasses it and overrides only the send/receive paths, calling the base (super) for the synchronous behaviour and reusing the inherited codec scratch members. - drop the invented splitNext/flushSync helpers and the duplicated outgoing_data_/incoming_data_*/compressible_bytes_ fields - remove the redundant closing_ flag (shared_from_this already gates teardown) - the event loop pulls decoded batches via the base _receivePacket; the main thread splits + dispatches events, mirroring BDS - regenerate the Windows symbol table from the 1.26.20.5 PDB (resolves onNewIncomingConnection)

…ction hook Move the BatchedNetworkPeer -> AsyncBatchedNetworkPeer splice out of a static AsyncBatchedNetworkPeer::splice() and directly into NetworkSystem's hook, where the connection and its peer chain already live. This lets NetworkPeer drop `friend endstone::core::AsyncBatchedNetworkPeer` (and the cross-layer forward declaration of an endstone::core type in the bedrock header); NetworkSystem -- already in the bedrock layer -- becomes the friend instead. AsyncBatchedNetworkPeer no longer reaches into bedrock internals.

The async strand decoded inbound packets by calling RakNetNetworkPeer::_receivePacket on a worker thread, racing newData() on the main thread over RakNet's unsynchronized per-connection read buffer. Insert a BufferedRakNetPeer at the bottom of an async connection's peer chain: its update() runs on the main thread (the per-tick update chain) and drains RakNet's read buffer into an SPSC queue; its _receivePacket() runs on the strand during decrypt/decompress and just pops the queue. The inner-chain update() now runs on the main thread so the drain lands there, mirroring BDS's own async-send concurrency model. RakNet's buffer is therefore only ever touched on the main thread. Also give NetworkPeer default chain-forwarding implementations for the pass-through virtuals so wrapper peers need not reimplement them.

Gate the async packet codec behind `[network] async` in endstone.toml, with `[network] threads` sizing the worker pool (0 = automatic). The EventLoopGroup is only created when async is enabled, and a warning is logged at startup noting the feature is experimental. When disabled, onNewIncomingConnection leaves BDS's synchronous peer chain untouched (zero overhead).

Add function-prologue byte patterns for NetworkSystem::onNewIncomingConnection to the Windows and Linux symbol configs, and regenerate linux.h. The Linux RVA was a placeholder 0 (the hook was a no-op); it now resolves to its real offset.

Use BEDROCK_STATIC_ASSERT_SIZE(BatchedNetworkPeer, 344, 320) instead of a Windows-only static_assert, which would fail the Linux build (libc++ shrinks the std::string and binary-stream members by 24 bytes). Pull in bedrock.h for the macro.

) The packet send/receive events live in AsyncBatchedNetworkPeer, but the splice bailed out early when [network] async was off, leaving stock BatchedNetworkPeer in place so PacketSendEvent/PacketReceiveEvent never fired. Always splice AsyncBatchedNetworkPeer so the events fire regardless; make its event loop optional and only set up the BufferedRakNetPeer + strand when async is enabled. With no event loop it never activates and stays a synchronous main-thread passthrough.

…356) Inline the send/receive event handling into sendPacket/_receivePacket and remove the std::optional<std::string> return contract. The unmodified path no longer copies the whole packet to thread it through an optional: send forwards the original `data` straight to the batch, and receive moves `raw` into out_data. Only a plugin-modified payload allocates a new buffer.

Nest the activation transition and the sync-passthrough divert under a single `!activated_` check instead of two sequential ones. No behavior change: once activated this tick or a prior one, control falls through to the async flush, recv scheduling and inner peer update.

Replace the shared io_context + per-connection strand with N single-threaded io_contexts (one std::thread each). A connection now binds to one event loop for its lifetime via EventLoopGroup::next(), so its codec work is serialized AND thread-affine -- the cipher/compression/buffer state stays hot on one core, with no strand dispatch hand-off. Modeled on Netty's EventLoopGroup.

wu-vincent linked an issue Jun 2, 2026 that may be closed by this pull request

Async network packet processing via RewrittenBatchPeer + asio strand #356

Open

wu-vincent added enhancement New feature or request high priority Things to fix ASAP but often of higher complexity. labels Jun 2, 2026

wu-vincent added 11 commits June 4, 2026 12:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(network): offload packet codec to an async strand pool#412

feat(network): offload packet codec to an async strand pool#412
wu-vincent wants to merge 12 commits into
developfrom
feat/async-batch

wu-vincent commented Jun 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

wu-vincent commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

Thread-safe receive (BufferedRakNetPeer)

Data flow

Thread-safety summary

Verified against BDS 1.26.10.4

Remaining / follow-ups

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wu-vincent commented Jun 2, 2026 •

edited

Loading

Thread-safe receive (`BufferedRakNetPeer`)