Skip to content

Socket pool liveness check#2574

Open
tylerkaraszewski wants to merge 3 commits intomainfrom
socket-pool-liveness-check
Open

Socket pool liveness check#2574
tylerkaraszewski wants to merge 3 commits intomainfrom
socket-pool-liveness-check

Conversation

@tylerkaraszewski
Copy link
Copy Markdown
Contributor

Details

This uses poll() to check if a socket has died when removing it from the pool, returning a different one if it has.

It was previously possible for a socket to die while sitting in the pool unused until we tried to write to it, at which point we'd notice the error. This checks for errors before returning the socket to the caller.

Note that it's still possible for a socket to die at any time, including after we have done this check but before the caller acts on the socket, but the exposure is now greatly reduced.

Fixed Issues

Fixes https://github.com/Expensify/Expensify/issues/613505#event-24230562693

Tests


Internal Testing Reminder: when changing bedrock, please compile auth against your new changes

@tylerkaraszewski tylerkaraszewski self-assigned this Apr 7, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 191a544bb8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread libstuff/SSocketPool.cpp Outdated
@tylerkaraszewski
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 81ca7fc939

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread libstuff/SSocketPool.cpp
Comment on lines +66 to +68
struct pollfd pfd = {socket->s, POLLIN | POLLOUT, 0};
int ret = poll(&pfd, 1, 0);
if (ret >= 0 && !(pfd.revents & (POLLERR | POLLHUP | POLLNVAL))) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject peer-closed sockets before returning from pool

This liveness check treats a socket as healthy whenever poll() has no POLLERR|POLLHUP|POLLNVAL, but a graceful peer close is commonly reported as readable EOF (POLLIN, and POLLRDHUP only if requested) without POLLHUP. In that case this code returns a dead pooled socket, so callers still hit a failure on first I/O and the stale-socket issue remains. Please include POLLRDHUP in requested/checked events or perform an EOF probe (for example recv(..., MSG_PEEK)) before reusing the socket.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tylerkaraszewski is this valid?

Comment thread libstuff/SSocketPool.cpp
// Verify the socket is still alive before returning it.
if (socket->state.load() != STCPManager::Socket::CLOSED) {
struct pollfd pfd = {socket->s, POLLIN | POLLOUT, 0};
int ret = poll(&pfd, 1, 0);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Retry interrupted poll instead of discarding socket

If poll() is interrupted by a signal (ret == -1 with errno == EINTR), this path falls through to stale-socket discard, even though the socket may be perfectly valid. In environments with periodic signals (profilers, timers), this can cause avoidable connection churn and extra reconnect latency. Treat EINTR as retryable here rather than marking the socket dead.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chances of this happening are so low and even if it happens the cost is just opening a new socket, I don't think this matters.

@tylerkaraszewski tylerkaraszewski requested a review from a team April 8, 2026 00:06
@melvin-bot melvin-bot bot requested review from mjasikowski and removed request for a team April 8, 2026 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants