fix(snell): drain pending data before pool reuse by missuo · Pull Request #2830 · MetaCubeX/mihomo

missuo · 2026-05-24T08:45:55Z

Summary

The reuse-mode snell client pool surfaces stale server data on every reuse, so against a real Surge snell-server (v5.0.1 observed) only the first session per pooled TCP succeeds and every subsequent one fails with EOF.

Three related changes:

Drain server data before pool reuse (transport/snell/pool.go). When the local side closes first (typical for short HTTP/1 responses), the snell server may still have its reply tail + its own zero-chunk half-close queued in our TCP receive buffer. The previous `PoolConn.Close` just sent our zero chunk and put the conn back; the next session's first read then saw that leftover data. The new `drainPendingForReuse` reads frames until `shadowaead.ErrZeroChunk` is observed, with a 500 ms deadline. If drain times out or errors, the conn is closed instead of pooled.
Cap each pooled TCP at 2 CONNECT sessions (`defaultMaxUsesPerConn`). Surge's snell-server closes a reuse-mode TCP after the second session, so without a cap the pool would hand the next caller a soon-to-be-dead socket. The cap is tracked per-conn via a new internal `pooledEntry` wrapper (no public API change beyond removing the never-called `Pool.Put`).
Retry once on stale pool conn (`adapter/outbound/snell.go`). Pooled conns can also die between put and re-get (e.g. server idle timeout). `DialContext` now retries the `pool.GetContext` + `writeHeaderContext` pair once — the second attempt either yields another idle conn or falls through to the pool factory and dials fresh.

Reproduction

Tested against a real Surge snell-server v5.0.1 with 6 sequential HTTP GETs through a `reuse=true` outbound (forced fresh `proxy.DialContext` per request via `DisableKeepAlives: true`):

Before this change:
```
[1] OK
[2] FAIL ... EOF
[3] FAIL ... EOF
[4] FAIL ... EOF
[5] FAIL ... EOF
[6] FAIL ... EOF
5/6 requests failed
```

After this change:
```
[1] OK dur=272ms
[2] OK dur=365ms
[3] OK dur=245ms
[4] OK dur=278ms
[5] OK dur=292ms
[6] OK dur=300ms
all 6 requests succeeded
```

The alternating timing matches the `maxUsesPerConn=2` cap: fresh dial (~290 ms) followed by pool hit (~250 ms).

Test plan

`go build ./...`
`go vet ./transport/snell/... ./adapter/outbound/...`
`go test ./transport/snell/...` — updated 2 existing tests for the new `uses` field and pool entry wrapper; added 3 new tests (`TestPoolConnCloseDrainsAndReturnsToPool`, `TestPoolConnCloseDiscardsWhenDrainFails`, `TestPoolConnCloseDiscardsAtMaxUsesPerConn`, `TestPoolGetContextIncrementsUses`)
End-to-end against Surge snell-server v5.0.1 — see reproduction above

When the local side of a relay closes first (typical for short HTTP/1 responses), the snell server may still have data queued in our TCP receive buffer — most importantly its own zero-chunk half-close. Returning the conn to the pool without draining surfaces that stale data on the next session's first read, breaking every reuse. Also cap each pooled TCP at 2 CONNECT sessions: Surge's snell-server (v5.0.1 observed) closes a reuse-mode TCP after the second session, so anything beyond the cap is a soon-to-be-dead socket. The outbound DialContext now retries a stale pool conn once so a write failure on a checked-out conn falls through to a fresh dial transparently. Repro against a real Surge snell-server v5.0.1: 6 sequential HTTP GETs through a reuse=true snell outbound. Before this change, 5/6 fail with EOF after the first session; after, 6/6 succeed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(snell): drain pending data before pool reuse#2830

fix(snell): drain pending data before pool reuse#2830
missuo wants to merge 1 commit into
MetaCubeX:Alphafrom
missuo:fix-snell-reuse-pool

missuo commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

missuo commented May 24, 2026

Summary

Reproduction

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant