Skip to content

feat(network): offload packet codec to an async strand pool#412

Draft
wu-vincent wants to merge 12 commits into
developfrom
feat/async-batch
Draft

feat(network): offload packet codec to an async strand pool#412
wu-vincent wants to merge 12 commits into
developfrom
feat/async-batch

Conversation

@wu-vincent
Copy link
Copy Markdown
Member

@wu-vincent wu-vincent commented Jun 2, 2026

Closes #356.

What this does

Splices an ABI-compatible AsyncBatchedNetworkPeer into each connection's peer chain in place of BDS's BatchedNetworkPeer (via an onNewIncomingConnection hook), moving that connection's decrypt/decompress/compress/encrypt onto a boost::asio worker pool once the connection authenticates. Modelled on Netty: the pool is an EventLoopGroup, and each connection binds to one EventLoop (a serialized strand) that handles both directions.

It derives from a full layout reconstruction of BatchedNetworkPeer (344 bytes, mAsyncEnabled at +0x150) so BDS's enableAsyncFlush writes the inherited flag — which gates activation: the whole handshake (encryption/compression setup) stays synchronous on the main thread, so there's no async window to race. PacketSendEvent/PacketReceiveEvent and packet patching always fire on the main thread; only the inner-chain codec runs on the event loop.

Thread-safe receive (BufferedRakNetPeer)

The inner chain ends at BDS's RakNetNetworkPeer, which keeps each connection's received packets in an unsynchronized per-peer buffer (mReadBufferDatas), filled by RakNetConnector::runEvents → newData() on the main thread. Decoding inbound packets on the strand would call RakNetNetworkPeer::_receivePacket from a worker and race that buffer.

To keep RakNet's buffer main-thread-only, a lightweight BufferedRakNetPeer is spliced at the bottom of the chain, wrapping the RakNetNetworkPeer:

  • its update() runs on the main thread (the per-tick update chain — the same thread as newData) and drains RakNet's buffer into a single-producer/single-consumer queue;
  • its _receivePacket() runs on the strand during decrypt/decompress and just pops that queue.

So the strand-side codec never touches RakNet's buffer. The inner-chain update() runs on the main thread (it forwards, does RakNet telemetry, and the drain) — mirroring BDS's own async model, where the codec sendPacket runs on a worker while update() runs on main.

Data flow

  • Send (main → strand): sendPacket fires PacketSendEvent + patching and batches on the main thread; flush() extracts the batch and posts peer_->sendPacket (compress → encrypt → RakNet) to the strand.
  • Receive (main → strand → main): BufferedRakNetPeer::update() drains RakNet on the main thread into its SPSC queue; a recvLoop on the strand pulls from it through decrypt/decompress into a second SPSC queue; the main-thread _sortAndPacketizeEvents pops the decoded queue and fires PacketReceiveEvent in arrival order.

NetworkPeer gains default chain-forwarding implementations for the pass-through virtuals so wrapper peers (BufferedRakNetPeer) need not reimplement them. AsyncBatchedNetworkPeer uses enable_shared_from_this so an in-flight task keeps the peer alive across teardown. The EventLoopGroup is RAII, owned by EndstoneServer via unique_ptr (ctor starts the workers, dtor joins them).

Builds and links cleanly on Windows (endstone_runtime.dll); the layout static_assert holds, and the server accepts connections / gameplay on 1.26.20.5.

Thread-safety summary

Every piece of mutable state has a single owning lane, crossing lanes only through the two SPSC queues:

  • RakNet read buffer (mReadBufferDatas): main only (newData + BufferedRakNetPeer::update poll).
  • Codec (compress / decompress / encrypt / decrypt): strand only, serialized per connection (a connection's cipher/HMAC chain is inherently serial).
  • Plugin events (PacketSend/ReceiveEvent, packet patching): main only.
  • Two SPSC queues (raw-from-RakNet, decoded-to-main): one producer / one consumer each; the strand serializes its side and provides happens-before between handlers even though it hops worker threads.
  • Send batch / Batched decode buffers: main-only / strand-only, with a clean handoff at activation.
  • activated_: main only (non-atomic by design); recv_scheduled_: atomic, single-in-flight recvLoop guard.

Verified against BDS 1.26.10.4

  • Codec update() does no send-side work. CompressedNetworkPeer/EncryptedNetworkPeer don't override update()/flush(); they inherit NetworkPeer::update(), which is literally if (mPeer) mPeer->update();. So running the inner-chain update() on the main thread only forwards (plus the BufferedRakNetPeer drain + RakNet telemetry) and never touches codec state — making it safe alongside the strand codec. (This is also why the forwarding defaults added to NetworkPeer match BDS exactly.)
  • Off-main teardown is safe. The inner-chain destructors are pure memory cleanup + atomic refcount ops: ~RakNetNetworkPeer frees its read/send buffers (no RakPeer close, no connector deregistration), ~EncryptedNetworkPeer frees its cipher/HMAC state, etc. So when the last shared_from_this() ref drops on a worker thread at teardown, destroying the chain there is fine.

Remaining / follow-ups

  • [network] worker-thread-count config knob (currently auto = cpu - 2).
  • 50+ connection load test (ordering / teardown validation).

🤖 Generated with Claude Code

Splice an ABI-compatible AsyncBatchedNetworkPeer in place of BDS's
BatchedNetworkPeer (via an onNewIncomingConnection hook), moving per-connection
decrypt/decompress/compress/encrypt onto a boost::asio strand pool once the
connection authenticates. The peer derives from a layout-faithful
BatchedNetworkPeer reconstruction so BDS's enableAsyncFlush writes the inherited
mAsyncEnabled flag, which gates activation: the whole handshake stays
synchronous on the main thread, and PacketSendEvent/PacketReceiveEvent always
fire on the main thread. The old BatchedNetworkPeer send/recv hooks are removed
(their logic now lives as virtual overrides in AsyncBatchedNetworkPeer).

WIP: the onNewIncomingConnection symbol offset is a placeholder in
src/bedrock/symbols/{windows,linux}.h -- regenerate via
'dump_symbols.py --pdb' before running. The inbound queue is a mutexed
std::queue (lock-free SPSC TODO) and the worker count is not yet configurable.
@wu-vincent wu-vincent added enhancement New feature or request high priority Things to fix ASAP but often of higher complexity. labels Jun 2, 2026
wu-vincent added 11 commits June 4, 2026 12:32
Reworks the async batched peer (#356) based on review feedback:

- Reconstruct the full BatchedNetworkPeer layout (real typed members instead
  of opaque padding), reusing the existing SPSCQueue for both the inherited
  send queue and the inbound sub-packet queue (drops the mutex + std::queue).
- Fold the per-connection async state into AsyncBatchedNetworkPeer via
  enable_shared_from_this so in-flight strand tasks keep the peer alive on
  teardown (no separate state object).
- Use one strand per connection for both directions (Netty EventLoop model)
  instead of separate recv/send strands.
- Replace NetworkThreadPool with a RAII EventLoopGroup (Netty naming: the pool
  is an EventLoopGroup, a per-connection strand is an EventLoop obtained via
  next()). EndstoneServer owns it through a unique_ptr; ctor starts the worker
  threads, dtor joins them. The flush completion callback runs inline on the
  main thread, removing the main-thread completion queue.
… overrides on demand

Reimplement BatchedNetworkPeer as a self-contained, BDS-faithful peer
(sendPacket/flush/update/_receivePacket + the trivial passthroughs) in its
own batched_network_peer.cpp. AsyncBatchedNetworkPeer now subclasses it and
overrides only the send/receive paths, calling the base (super) for the
synchronous behaviour and reusing the inherited codec scratch members.

- drop the invented splitNext/flushSync helpers and the duplicated
  outgoing_data_/incoming_data_*/compressible_bytes_ fields
- remove the redundant closing_ flag (shared_from_this already gates teardown)
- the event loop pulls decoded batches via the base _receivePacket; the main
  thread splits + dispatches events, mirroring BDS
- regenerate the Windows symbol table from the 1.26.20.5 PDB (resolves
  onNewIncomingConnection)
…ction hook

Move the BatchedNetworkPeer -> AsyncBatchedNetworkPeer splice out of a static
AsyncBatchedNetworkPeer::splice() and directly into NetworkSystem's hook, where
the connection and its peer chain already live.

This lets NetworkPeer drop `friend endstone::core::AsyncBatchedNetworkPeer` (and
the cross-layer forward declaration of an endstone::core type in the bedrock
header); NetworkSystem -- already in the bedrock layer -- becomes the friend
instead. AsyncBatchedNetworkPeer no longer reaches into bedrock internals.
The async strand decoded inbound packets by calling
RakNetNetworkPeer::_receivePacket on a worker thread, racing newData() on the
main thread over RakNet's unsynchronized per-connection read buffer.

Insert a BufferedRakNetPeer at the bottom of an async connection's peer chain:
its update() runs on the main thread (the per-tick update chain) and drains
RakNet's read buffer into an SPSC queue; its _receivePacket() runs on the
strand during decrypt/decompress and just pops the queue. The inner-chain
update() now runs on the main thread so the drain lands there, mirroring BDS's
own async-send concurrency model. RakNet's buffer is therefore only ever
touched on the main thread.

Also give NetworkPeer default chain-forwarding implementations for the
pass-through virtuals so wrapper peers need not reimplement them.
Gate the async packet codec behind `[network] async` in endstone.toml, with
`[network] threads` sizing the worker pool (0 = automatic). The EventLoopGroup
is only created when async is enabled, and a warning is logged at startup noting
the feature is experimental. When disabled, onNewIncomingConnection leaves BDS's
synchronous peer chain untouched (zero overhead).
Add function-prologue byte patterns for NetworkSystem::onNewIncomingConnection
to the Windows and Linux symbol configs, and regenerate linux.h. The Linux RVA
was a placeholder 0 (the hook was a no-op); it now resolves to its real offset.
Use BEDROCK_STATIC_ASSERT_SIZE(BatchedNetworkPeer, 344, 320) instead of a
Windows-only static_assert, which would fail the Linux build (libc++ shrinks
the std::string and binary-stream members by 24 bytes). Pull in bedrock.h for
the macro.
)

The packet send/receive events live in AsyncBatchedNetworkPeer, but the splice
bailed out early when [network] async was off, leaving stock BatchedNetworkPeer
in place so PacketSendEvent/PacketReceiveEvent never fired.

Always splice AsyncBatchedNetworkPeer so the events fire regardless; make its
event loop optional and only set up the BufferedRakNetPeer + strand when async
is enabled. With no event loop it never activates and stays a synchronous
main-thread passthrough.
…356)

Inline the send/receive event handling into sendPacket/_receivePacket and
remove the std::optional<std::string> return contract. The unmodified path no
longer copies the whole packet to thread it through an optional: send forwards
the original `data` straight to the batch, and receive moves `raw` into
out_data. Only a plugin-modified payload allocates a new buffer.
Nest the activation transition and the sync-passthrough divert under a single `!activated_` check instead of two sequential ones. No behavior change: once activated this tick or a prior one, control falls through to the async flush, recv scheduling and inner peer update.
Replace the shared io_context + per-connection strand with N single-threaded
io_contexts (one std::thread each). A connection now binds to one event loop for
its lifetime via EventLoopGroup::next(), so its codec work is serialized AND
thread-affine -- the cipher/compression/buffer state stays hot on one core, with
no strand dispatch hand-off. Modeled on Netty's EventLoopGroup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request high priority Things to fix ASAP but often of higher complexity.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Async network packet processing via RewrittenBatchPeer + asio strand

1 participant