Skip to content

Fix/issue 817 idle timeout log level#824

Open
lutz-grex wants to merge 2 commits intomodelcontextprotocol:mainfrom
lutz-grex:fix/issue-817-idle-timeout-log-level
Open

Fix/issue 817 idle timeout log level#824
lutz-grex wants to merge 2 commits intomodelcontextprotocol:mainfrom
lutz-grex:fix/issue-817-idle-timeout-log-level

Conversation

@lutz-grex
Copy link
Copy Markdown
Contributor

Idle keep-alive timeout is normal zombie-session cleanup, not a transport failure. This PR stops it from producing ERROR-level log lines that are indistinguishable from real transport fatals.

Motivation and Context

When LocalSessionWorker hits its configured keep-alive timeout (default 5 min), it returns WorkerQuitReason::Fatal, which the transport worker logs at tracing::error!. The SessionConfig::keep_alive docstring explicitly
frames this as a safety net for silently-dropped HTTP connections — normal lifecycle cleanup, not an error condition.

This causes every idle reap to emit:

ERROR rmcp::transport::worker: worker quit with fatal: keep alive timeout after 300000ms, when poll next session event

Operators who wire ERROR-level alerts get paged on something that is normal by design. Downstream consumers cannot filter idle reaps without also silencing real transport fatals.

Additionally, the post-exit cleanup path in close_session could fail with SessionServiceTerminated when the worker had already exited due to idle timeout, producing a second spurious error.

Closes #817

How Has This Been Tested?

  • Added an integration test (test_keep_alive_timeout_does_not_emit_error_log) that spins up a Streamable HTTP server with a 200ms keep-alive, connects a client, waits for idle reap, and asserts: no ERROR-level logs are
    emitted, and a DEBUG log with IdleTimeout is present.
  • Added a second integration test (test_explicit_close_on_live_session_succeeds) that verifies close_session on a still-alive worker succeeds without error.
  • Existing tests pass locally (cargo test).

Breaking Changes

None. WorkerQuitReason is #[non_exhaustive], so the new IdleTimeout variant is additive. The removed LocalSessionWorkerError::KeepAliveTimeout variant was not part of the public API.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

This implements Option A from #817: a dedicated WorkerQuitReason::IdleTimeout(Duration) variant that is logged at debug level alongside the other expected-exit arms (Cancelled, TransportClosed, HandlerTerminated). True
transport failures continue to flow through Fatal and keep their ERROR severity.

Changes across two commits:

  1. d41d3c8 — Add WorkerQuitReason::IdleTimeout, return it from the keep-alive arm, log at debug. Remove unused KeepAliveTimeout error variant.
  2. 2463369 — Swallow SessionServiceTerminated in close_session when the worker has already exited, preventing a spurious error during post-exit cleanup.

Idle keep-alive timeout is normal zombie-session cleanup, not a transport failure.

Route it through a dedicated WorkerQuitReason::IdleTimeout variant.

Log it at debug level instead of treating it as a fatal error.

Remove the unused LocalSessionWorkerError::KeepAliveTimeout variant.

Closes modelcontextprotocol#817
Swallow SessionServiceTerminated in close_session when the worker has already exited.
This prevents a spurious ERROR log during the post-exit cleanup path in
spawn_session_worker.
@lutz-grex lutz-grex requested a review from a team as a code owner April 30, 2026 12:18
@github-actions github-actions Bot added T-test Testing related changes T-core Core library changes T-transport Transport layer changes labels Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

T-core Core library changes T-test Testing related changes T-transport Transport layer changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LocalSessionWorker idle keep-alive timeout emits tracing::error! for normal zombie-session reaping

1 participant