feat(lifecycle): two-phase listener close for graceful drain (P2)#11
Open
andypost wants to merge 1 commit into
Open
feat(lifecycle): two-phase listener close for graceful drain (P2)#11andypost wants to merge 1 commit into
andypost wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements a two-phase close mechanism for router listen events to ensure graceful reconfiguration. It introduces a "draining" state in nxt_listen_event_t that disarms accept(2) while allowing in-flight connections to complete before the file descriptor is released. Functional tests have been added to verify that TLS handshakes are preserved during reconfiguration and that listener FDs are correctly closed. I have no feedback to provide as there were no review comments to assess.
This was referenced May 8, 2026
Disarm listener accepts before releasing the old listener FD, keep accepted-connection refs alive until the drain completes, and post config readiness once accept is disarmed so reconfiguration does not block on idle client timeouts. Treat draining idle HTTP/TLS accepted connections like closed listeners while preserving already-started requests on their existing configuration. Add listener-drain functional coverage, including the deferred-accept timing needed for the TLS handshake case.
8e592f8 to
afce757
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 2 of the graceful-shutdown / graceful-reload plan documented in
roadmap/plan-graceful-shutdown.mdon theroadmapbranch.Splits
nxt_router_listen_socket_close()into a two-phase state machine so in-flight accepted connections complete cleanly when a listener is reconfigured under load. Pre-P2, the listener FD was released the same tick the close timer fired and any unfinished TLS handshake or accepted-but-not-yet-handled connection was RST.State machine
Phase 2 re-enters from
nxt_router_listen_event_release()when the last accepted-conn ref is dropped.finish()then drops the listener's own ref (count: 1 → 0) and freeslevvia thenxt_router_listen_event_release()call at its tail.Mirrors the
engine->shutdown+nxt_queue_is_empty()pattern already used innxt_router_worker_thread_quit().What changes
src/nxt_conn.h— newuint8_t drainingbit onnxt_listen_event_t.src/nxt_router.c—nxt_router_listen_socket_close()now guards the disarm withnxt_fd_event_is_active(), returns deferred whenlev->count > 1, and falls through to the newnxt_router_listen_socket_close_finish()when count is 1. The release path re-entersfinish()when--lev->count == 1 && lev->draining.test/test_listener_drain.py— new functional tests.Scope note
This is accepted-connection drain only. TCP connections completed by the kernel but still in the kernel listen queue waiting for userspace
accept(2)are not preserved and will be RST when the FD is released. Kernel-accept-queue drain is P5 territory and is explicitly out of scope here.Upstream lessons applied
nxt_router_listen_socket_release()unchanged, so no new unlink-without-stat hazards.nxt_fd_event_is_active()so it works on both epoll and kqueue even afternxt_router_listen_socket_delete()has already removed the read event.Tests
test/test_listener_drain.pyadds:test_listener_reconfigure_drains_inflight_tls_handshake— opens raw TCP, gives the router a beat toaccept(2), then drops TLS via reconfigure before driving the handshake. Must terminate with clean TLS error or EOF, NOTECONNRESET(the pre-P2 regression).test_listener_close_releases_fd_eventually— regression guard that the old listener FD actually closes after drain.test_listener_drain_no_dropped_accepted_connection— explicitly skipped with a clear reason. Per-connection drain is P5 territory; the test is staged here so P5 can flip it on by removing the skip marker.Verification
Independence
This PR does not depend on PR #7 (P1 graceful signal plumbing). It branches off
masterand the two changes touch disjoint files. Either order of merge works.Out of scope
graceful_timeoutescalation (P5).POST /reloadendpoint that consumes this primitive (P6).Generated by Claude Code