Skip to content

fix(command): harden length-delimited + SCM IPC against panic-OOB#1260

Merged
FlorentinDUBOIS merged 3 commits into
mainfrom
fix/command-ipc-panic-hardening
May 20, 2026
Merged

fix(command): harden length-delimited + SCM IPC against panic-OOB#1260
FlorentinDUBOIS merged 3 commits into
mainfrom
fix/command-ipc-panic-hardening

Conversation

@FlorentinDUBOIS
Copy link
Copy Markdown
Collaborator

@FlorentinDUBOIS FlorentinDUBOIS commented May 20, 2026

Summary

Two panic-on-malformed-IPC bugs in sozu-command-lib are reachable across the worker/master trust boundary, and one of them — the length-delimited command channel — is externally reachable through the admin unix socket. Both panicked with slice index starts at N but ends at M on attacker-controlled framing. This branch hardens the receiver and adds unit-level + end-to-end regression tests.

# Severity Surface Fix
1 HIGH command/src/channel.rs:476&buffer[delimiter_size()..message_len] reject message_len < delimiter_size() and drop the bogus delimiter so the channel re-syncs
2 MEDIUM command/src/scm_socket.rs:148-167received_fds[index..index+len] validate http+tls+tcp ≤ min(MAX_FDS_OUT, fds_received) before slicing

Both bugs violate the rule stated in CLAUDE.md: "No panic on network-facing input. In parser, socket, mux, TLS, command-channel, and config paths, convert invalid traffic into … a contextual log."

Finding 1 — channel.rs slice-OOB (HIGH)

Channel::try_read_delimited_message decoded a peer-supplied usize length prefix and then sliced &buffer[delimiter_size()..message_len]. A peer that wrote a delimiter declaring message_len < delimiter_size() (e.g. 5 on a 64-bit host) bypassed the existing > max_buffer_size guard and panicked the receiver. Reachable from any local user with write perms on the admin unix socket → full-proxy DoS until restart.

Patch:

  • new ChannelError::MessageLengthUnderDelimiter variant, symmetric with the existing MessageTooLarge upper-bound rejection;
  • consume the bogus delimiter bytes (self.front_buf.consume(delimiter_size())) before returning, so the channel re-syncs on the peer's next valid frame instead of looping forever on the stale header.

CWE-129 (Improper Validation of Array Index) → reachable as CWE-248 (Uncaught Exception).

Finding 2 — scm_socket.rs slice-OOB (MEDIUM)

ScmSocket::receive_listeners zipped the peer-supplied ListenersCount.{http,tls,tcp} lists against a fixed [RawFd; MAX_FDS_OUT] (200 slots) without bounds-checking the declared counts. A manifest declaring more entries than MAX_FDS_OUT or more entries than the FDs that actually arrived panicked the receiver on received_fds[index..index + len], or on the trailing received_fds[index..file_descriptor_length] slice when index > file_descriptor_length.

Patch:

  • validate http + tls + tcp ≤ min(MAX_FDS_OUT, fds_received) up front, with checked_add to guard against usize overflow;
  • surface the mismatch as a new ScmSocketError::ListenersCountInconsistent carrying every input that contributed to the decision;
  • rework the trailing tcp slice from received_fds[index..file_descriptor_length] to the symmetric received_fds[index..index + tcp_len] (identical result under the new invariant; no silent excess-FD acceptance).

Protocol / security impact

  • Command channel: no wire-format change. New error path is reached on a previously-panicking input only.
  • SCM channel: no wire-format change. A well-formed ListenersCount (where counts match the FD set, as produced by ScmSocket::send_listeners) is accepted identically.
  • No change to TLS, H1, H2, proxy-protocol, or any front-facing protocol surface.
  • No new metrics, config keys, or CLI flags. doc/configure.md unchanged.

Tests added

Unit regressions (live alongside the patched code):

  • command/src/channel.rs::tests::rejects_declared_length_below_delimiter — writes a 5-byte-declared delimiter via raw Write::write_all on the underlying mio unix stream and asserts the new error variant instead of the pre-fix panic.
  • command/src/scm_socket.rs::tests::rejects_listeners_count_with_more_entries_than_fds — ships a forged ListenersCount { http: 3 } with zero FDs and asserts the new error variant.

End-to-end regression (new file e2e/src/tests/command_channel_security_tests.rs, 218 lines):

  • channel_short_delimiter_does_not_panic_worker spawns a real worker via Worker::start_new_worker, writes the malformed delimiter, asserts the worker thread is still alive, then sends a real SoftStop and asserts wait_for_server_stop() returns cleanly — the second assertion is the re-sync proof.
  • scm_inconsistent_listeners_count_does_not_panic_server_constructor drives the real Server::try_new_from_config constructor with a forged SCM payload and pattern-matches on ServerError::ScmSocket(ListenersCountInconsistent). The match arm references the new variant, so a future revert of scm_socket.rs fails to compile — compile-time regression gate.

Sensitivity verified by reverting each fix locally:

  • without the channel.rs bounds check, the e2e test panics at channel.rs:498:49 with the exact original slice index starts at 8 but ends at 5 message;
  • without the scm_socket.rs guard, the e2e file fails to compile because the new enum variant is gone.

Commands run

cargo build --all-features --locked                    # green (1m12s)
cargo +nightly fmt --all -- --check                    # clean
cargo clippy --all-targets --locked -p sozu-command-lib -- -D warnings   # clean
cargo clippy -p sozu-e2e --tests -- -D warnings        # clean
cargo test -p sozu-command-lib --locked                # 94/94 pass
cargo test -p sozu-lib --locked                        # 542/542 pass
cargo test -p sozu --locked                            # 34/34 pass
cargo test -p sozu-e2e --offline command_channel_security  # 2/2 pass

Doc updates

None required — no new config keys, metrics, CLI flags, or behaviour change for well-formed traffic. CLAUDE.md's "no panic on network-facing input" rule already covers the command-channel path; this PR brings the implementation in line with that rule.

Test plan

  • CI matrix (stable / beta / nightly × crypto-ring / crypto-aws-lc-rs / crypto-openssl / fips) all green
  • cargo +nightly fmt --check green
  • cargo test -p sozu-e2e -- command_channel_security shows 2 passing tests
  • Reviewer reverts each fix(command): commit locally to confirm the e2e tests turn red (panic for channel, compile error for SCM)

`Channel::try_read_delimited_message` decoded a peer-supplied `usize`
length prefix and then sliced `&buffer[delimiter_size()..message_len]`.
A peer that wrote a delimiter declaring `message_len < delimiter_size()`
(e.g. 5 on a 64-bit host) bypassed the `> max_buffer_size` guard and
panicked the receiver with `slice index starts at 8 but ends at 5`.

This is reachable across every length-delimited channel pair: the
master ↔ worker command IPC, the master ↔ CLI admin socket, and the
test harness. A single 8-byte write to the admin unix socket therefore
crashes the master command loop and tears down supervision —
proxy-wide DoS until the operator restarts.

Reject the bad frame with a new `ChannelError::MessageLengthUnderDelimiter`
variant, symmetric with the existing `MessageTooLarge` upper-bound
guard, and consume the bogus delimiter bytes so the channel re-syncs on
the peer's next valid frame instead of looping forever on the same
stale header.

Regression: `rejects_declared_length_below_delimiter` in
`command/src/channel.rs` writes a crafted 8-byte delimiter directly to
the underlying mio unix stream and asserts the new error variant
instead of the pre-fix panic.

CWE-129 / CWE-248 — input validation gap at a trust boundary,
reachable as an uncaught panic.

Signed-off-by: Florentin Dubois <florentin.dubois@clever.cloud>
`ScmSocket::receive_listeners` zipped the peer-supplied
`ListenersCount.{http,tls,tcp}` lists against a fixed-size
`[RawFd; MAX_FDS_OUT]` (200 slots) without bounds-checking the
declared counts. A manifest declaring more entries than
`MAX_FDS_OUT`, or more entries than the FDs actually arrived over
SCM, panicked the worker on `received_fds[index..index + len]` (or on
the trailing `received_fds[index..file_descriptor_length]` slice when
`index > file_descriptor_length`).

The SCM channel is normally privileged (master → worker), so a panic
here requires either a buggy master, an alternate sidecar that opens
the SCM peer, or memory corruption. Even so, the bug is identical in
class to the channel.rs slice-OOB (CWE-129 / CWE-248) and violates
the project's "no panic on network-facing input" rule, which
explicitly extends to the command-channel surface.

Validate `http + tls + tcp <= min(MAX_FDS_OUT, fds_received)` up front
(with a `checked_add` guard against `usize` overflow), and surface the
mismatch as a new `ScmSocketError::ListenersCountInconsistent`
carrying every input that contributed to the decision. The trailing
`tcp` slice is reworked from `received_fds[index..file_descriptor_length]`
to the symmetric `received_fds[index..index + tcp_len]`; identical
result under the new invariant, no silent excess-FD acceptance.

Regression: `rejects_listeners_count_with_more_entries_than_fds` in
`command/src/scm_socket.rs` ships a forged `ListenersCount { http: 3
entries }` with zero FDs over a real socket pair and asserts the new
error variant instead of the pre-fix panic.

Signed-off-by: Florentin Dubois <florentin.dubois@clever.cloud>
Adds an end-to-end regression module that drives the production code
paths for both IPC framing fixes — not just the unit-level decoder.

`channel_short_delimiter_does_not_panic_worker` spawns a live worker
via the `Worker::start_new_worker` harness, writes a raw 8-byte
delimiter declaring `message_len = 5` directly to the underlying mio
unix stream, asserts the worker thread is still alive after the
malformed frame, then sends a real `SoftStop` and asserts
`wait_for_server_stop()` returns cleanly. The second assertion is the
re-sync proof: without dropping the bogus delimiter from the front
buffer (the channel.rs companion change), the SoftStop would never
decode and the harness would hang.

`scm_inconsistent_listeners_count_does_not_panic_server_constructor`
ships a forged `ListenersCount { http: 3 entries }` with zero file
descriptors over a real SCM socket pair (calling `nix::sendmsg`
directly to bypass `ScmSocket::send_listeners`, which keeps addresses
and FDs in sync by construction), then exercises the full
`Server::try_new_from_config` constructor on the receiving side and
pattern-matches on `ServerError::ScmSocket(ListenersCountInconsistent)`
instead of the pre-fix indexing panic. The match arm references the
new variant, so a future revert of `scm_socket.rs` fails to compile —
compile-time regression gate.

Sensitivity verified by reverting each fix locally:
- without the channel.rs bounds check, the e2e test panics at
  `channel.rs:498:49` with the exact original slice OOB message;
- without the scm_socket.rs guard, the file fails to compile because
  the new enum variant is gone.

Adds two `[dev-dependencies]` to `e2e/Cargo.toml`:
- `prost` to encode the forged length-delimited `ListenersCount`;
- `nix` with `socket` + `uio` features for the raw `sendmsg` shim.

Signed-off-by: Florentin Dubois <florentin.dubois@clever.cloud>
@FlorentinDUBOIS FlorentinDUBOIS force-pushed the fix/command-ipc-panic-hardening branch from 24bc851 to f0cd5ab Compare May 20, 2026 11:37
@FlorentinDUBOIS FlorentinDUBOIS self-assigned this May 20, 2026
@FlorentinDUBOIS FlorentinDUBOIS merged commit 4a176b0 into main May 20, 2026
21 checks passed
@FlorentinDUBOIS FlorentinDUBOIS deleted the fix/command-ipc-panic-hardening branch May 20, 2026 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant