Skip to content

fix(conn): route to error_handler on partial-write-then-error (P3)#16

Draft
andypost wants to merge 1 commit into
masterfrom
p3-write-path-contract
Draft

fix(conn): route to error_handler on partial-write-then-error (P3)#16
andypost wants to merge 1 commit into
masterfrom
p3-write-path-contract

Conversation

@andypost
Copy link
Copy Markdown
Owner

@andypost andypost commented May 8, 2026

Summary

Phase 3 of the graceful-shutdown / graceful-reload plan documented in roadmap/plan-graceful-shutdown.md on the roadmap branch — Pattern D′ (write-path contract for non-TLS). Closes the silent-graceful-EOF gap in nxt_conn_io_write() so a partial-write-then-error sequence routes to error_handler immediately, matching the TLS path's fail-fast shape (nxt_openssl_conn_test_error at src/nxt_openssl.c:1543-1607).

What changes

src/nxt_conn_write.c — write-loop routing

Master had:

if (ret == 0 || sb.sent != 0) {
    /*
     * ret == NXT_ERROR is ignored here if some data was sent,
     * the error will be handled on the next nxt_conn_write() call.
     */
    c->sent += sb.sent;
    nxt_work_queue_add(... ready_handler ...);
    return;
}

so a write that returned NXT_ERROR after a partial send (e.g. ECONNRESET on the second writev/sendfile of the same nxt_conn_io_write() call after the first succeeded) was silently routed to ready_handler, deferring detection until the next event-loop tick. The TLS path already fails fast on the equivalent condition; this PR extends that contract to plain HTTP.

New shape:

if (ret == 0 || (ret != NXT_ERROR && sb.sent != 0)) {
    /* normal completion path - unchanged */
    ...
}

if (ret != NXT_ERROR) {
    return;
}

if (sb.sent != 0) {
    c->sent += sb.sent;
    nxt_log(task, NXT_LOG_INFO,
            "conn write: peer closed mid-response fd:%d sent:%O",
            c->socket.fd, sb.sent);
}

nxt_fd_event_block_write(engine, &c->socket);
/* falls through to error: label */

NXT_AGAIN with sb.sent != 0 still routes to ready_handler (correct — caller should retry). Only NXT_ERROR with sb.sent != 0 is the new fast-fail path, with one INFO record so operators can correlate the routing decision with the underlying syscall log emitted at nxt_socket_error_level (also INFO for ECONNRESET-class).

src/nxt_router.c — severity demote

nxt_router_get_mmap_handler() had a // FIXME placeholder logging at ALERT when the reply port's app pointer was NULL. That branch fires legitimately during graceful reload when an app's reply port outlives the app struct — benign race, not corruption. Demoted to INFO with a comment so reload-under-load doesn't flood the log.

The companion // FIXME at nxt_router.c:5914 (mmap-id-out-of-range) is intentionally left at ALERT: it represents a real protocol violation and should not be hidden. Documented inline.

Scope deviations from the plan

The plan in roadmap/plan-graceful-shutdown.md listed six sites under P3. On inspection three of them are not write-path D′:

Plan citation Actual classification Disposition
src/nxt_h1proto.c write loop Actually src/nxt_conn_write.c:121-131 (one level deeper) Fixed here
src/nxt_router.c:5898 (NULL-app) Benign reload race Demoted here
src/nxt_router.c:5914 (mmap-id-out-of-range) Protocol violation, not peer-close Left at ALERT, FIXME comment removed
src/nxt_port_socket.c:749 Read-side IPC buf-alloc TODO Out of scope (separate IPC-resilience PR)
src/nxt_port_socket.c:892 Read-side IPC buf-alloc TODO Out of scope
src/nxt_port_socket.c:1345 Port error handler TODO (read-side) Out of scope

The three nxt_port_socket.c sites are legitimate buf-alloc backpressure bugs but are read-side, not write-path; conceptually distinct from Pattern D′. Recommend tracking as a separate IPC-resilience PR. Plan doc should be updated in a follow-up.

No regression test in this PR

The new branch fires only when one nxt_conn_io_write() call has (a) one successful sendbuf followed by (b) one NXT_ERROR sendbuf in its do-while loop. On loopback with SO_LINGER{1,0}+close, the RST arrives before Unit's first sendbuf in 100% of attempts, so sb.sent stays 0 and the new branch is unreachable from a black-box test.

Deterministic regression coverage requires malloc-failure / syscall-failure injection so the second sendbuf can be forced to NXT_ERROR after the first succeeds. That harness is designed in roadmap/plan-malloc-injection.md (PR #9) and will be the natural follow-up consumer.

The fix itself is review-verified by inspection of the two-line boolean change and the four lines of new logging.

Tests

./configure --tests
make -j$(nproc)                                          # clean build, no warnings
python3 -m pytest test/test_idle_close_wait.py           # 2 pass
python3 -m pytest test/test_static.py                    # 18 pass, 1 skip

The test_static.py suite exercises the share + writev/sendfile paths the fix touches; no behavioural change observed for normal-completion or NXT_AGAIN-mid-response traffic.

Independence

This PR does not depend on PR #7 (P1 graceful signal plumbing) or PR #11 (P2 listener drain). It branches off master and the three changes touch disjoint files.

Out of scope

  • The three nxt_port_socket.c sites — separate IPC-resilience PR.
  • Engine teardown TODOs (P4).
  • Connection drain with timeout (P5).
  • The X3 POST /reload endpoint (P6).
  • Per-language hooks (P7).

Generated by Claude Code


Generated by Claude Code

Phase 3 of the graceful-shutdown / graceful-reload plan documented in
roadmap/plan-graceful-shutdown.md on the roadmap branch — Pattern D'
(write-path contract for non-TLS).

src/nxt_conn_write.c
--------------------

nxt_conn_io_write() in master ran:

    if (ret == 0 || sb.sent != 0) {
        /*
         * ret == NXT_ERROR is ignored here if some data was sent,
         * the error will be handled on the next nxt_conn_write() call.
         */
        c->sent += sb.sent;
        nxt_work_queue_add(... ready_handler ...);
        return;
    }

so a write that returned NXT_ERROR after a partial send (e.g. ECONNRESET
on the second writev/sendfile of the same nxt_conn_io_write call after
the first succeeded) was silently routed to ready_handler, deferring
detection until the next event-loop tick.  The TLS path
(nxt_openssl_conn_test_error at src/nxt_openssl.c:1543-1607) already
fails fast on the equivalent condition; this commit extends that
contract to plain HTTP.

The new routing:

    if (ret == 0 || (ret != NXT_ERROR && sb.sent != 0)) {
        /* normal completion path - unchanged */
        ...
    }

    if (ret != NXT_ERROR) {
        return;
    }

    if (sb.sent != 0) {
        c->sent += sb.sent;
        nxt_log(task, NXT_LOG_INFO,
                "conn write: peer closed mid-response fd:%d sent:%O",
                c->socket.fd, sb.sent);
    }

    nxt_fd_event_block_write(engine, &c->socket);
    /* falls through to error: label */

NXT_AGAIN with sb.sent != 0 still routes to ready_handler (correct -
caller should retry).  Only NXT_ERROR with sb.sent != 0 is the new
fast-fail path, with one INFO record so operators can correlate the
routing decision with the underlying writev/send/sendfile syscall log
emitted at nxt_socket_error_level (also INFO for ECONNRESET-class).

src/nxt_router.c
----------------

In nxt_router_get_mmap_handler(), the NULL-app-pointer branch logged at
ALERT severity (a //FIXME placeholder).  This branch fires legitimately
during graceful reload when an app's reply port outlives the app
struct - it is a benign race, not corruption.  Demoted to INFO with a
comment so reload-under-load doesn't flood the log with ALERT entries.

The companion FIXME at nxt_router.c:5914 (mmap-id-out-of-range) is
left at ALERT: it represents a real protocol violation and should not
be hidden.  See P3 plan note in plan-graceful-shutdown.md.

Scope notes
-----------

Of the six sites originally listed under P3 in plan-graceful-shutdown.md:

- src/nxt_conn_write.c (the actual h1proto write loop, one level deeper
  than the plan's "nxt_h1proto.c write loop" pointer): fixed here.
- src/nxt_router.c:5898 (get_mmap_handler NULL-app): demoted here.
- src/nxt_router.c:5914 (mmap-id-out-of-range): left at ALERT - protocol
  violation, not peer-close.
- src/nxt_port_socket.c:749, :892, :1345: re-classified as port-socket
  read-side (IPC) buf-alloc backpressure, not write-path D'.  Belongs
  in a separate IPC-resilience PR; tracked but out of scope here.

No regression test in this commit
---------------------------------

The new branch fires only when one nxt_conn_io_write() call has
(a) one successful sendbuf followed by (b) one NXT_ERROR sendbuf in
its do-while loop.  On loopback with SO_LINGER{1,0}+close, the RST
arrives before Unit's first sendbuf in 100% of attempts, so sb.sent
stays 0 and the new branch is unreachable from a black-box test.

Deterministic regression coverage requires malloc-failure / syscall-
failure injection so the second sendbuf can be forced to NXT_ERROR
after the first succeeds.  That harness is designed in
roadmap/plan-malloc-injection.md (PR #9) and will be the natural
follow-up consumer.

The fix itself is review-verified by inspection of the two-line
boolean change and the four lines of new logging.

Tests
-----

./configure --tests
make -j$(nproc)                                          clean build
python3 -m pytest test/test_idle_close_wait.py           2 pass
python3 -m pytest test/test_static.py                    18 pass, 1 skip

The static.py suite exercises the share + writev/sendfile paths the
fix touches; no behavioural change observed for normal-completion or
NXT_AGAIN-mid-response traffic.
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the connection write logic to implement a fail-fast mechanism for partial sends that encounter an error, ensuring consistency with the TLS path. It also demotes a log alert in the router to an info level for a known race condition during shutdown. The reviewer suggested updating the connection's write buffer pointer in the new error handling block to maintain state consistency.

Comment thread src/nxt_conn_write.c
Comment on lines +133 to +146
if (sb.sent != 0) {
/*
* Pattern D′: route to error_handler immediately when a write
* returns NXT_ERROR after a partial send (e.g. peer closed
* mid-response). Pre-fix, this case fell through to
* ready_handler and the error was deferred to the next
* nxt_conn_io_write() call via c->socket.error. Mirrors the
* TLS path's fail-fast contract; see nxt_openssl_conn_test_error.
*/
c->sent += sb.sent;
nxt_log(task, NXT_LOG_INFO,
"conn write: peer closed mid-response fd:%d sent:%O",
c->socket.fd, sb.sent);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the success path, c->write should be updated here to reflect the partial data sent before the error occurred. This ensures the connection state remains consistent if the error_handler or subsequent logging logic inspects the remaining buffers.

    if (sb.sent != 0) {
        /*
         * Pattern D′: route to error_handler immediately when a write
         * returns NXT_ERROR after a partial send (e.g. peer closed
         * mid-response).  Pre-fix, this case fell through to
         * ready_handler and the error was deferred to the next
         * nxt_conn_io_write() call via c->socket.error.  Mirrors the
         * TLS path's fail-fast contract; see nxt_openssl_conn_test_error.
         */
        c->sent += sb.sent;
        c->write = b;
        nxt_log(task, NXT_LOG_INFO,
                "conn write: peer closed mid-response fd:%d sent:%O",
                c->socket.fd, sb.sent);
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants