Skip to content

perf: pipeline SFTP requests for upload/download (~2-3x speedup)#196

Merged
inureyes merged 4 commits intolablup:mainfrom
Yaminyam:feat/sftp-request-pipelining
May 10, 2026
Merged

perf: pipeline SFTP requests for upload/download (~2-3x speedup)#196
inureyes merged 4 commits intolablup:mainfrom
Yaminyam:feat/sftp-request-pipelining

Conversation

@Yaminyam
Copy link
Copy Markdown
Member

@Yaminyam Yaminyam commented May 6, 2026

Summary

The high-level AsyncWrite/AsyncRead impls on File issue exactly one SFTP WRITE/READ at a time and await its STATUS/DATA reply before sending the next. Sustained throughput is therefore bounded by chunk_size / RTT — at 50 ms RTT with the default 256 KiB chunk that caps a single transfer at ~5 MiB/s no matter how fast the link is. This is #3 in the SFTP-stack analysis ("the largest unrealized optimization").

This PR adds two pipelined helpers on File that keep up to N SFTP requests in flight concurrently, mirroring OpenSSH sftp(1)'s default of -R 64.

Changes

  • crates/bssh-russh-sftp/Cargo.toml: add futures = "0.3" (std + async-await features only) for FuturesUnordered.
  • crates/bssh-russh-sftp/src/client/fs/file.rs: two new public methods on File:
    • write_all_pipelined<R: AsyncRead>(reader, max_inflight) -> SftpResult<u64>: reads chunks from reader and dispatches session.write(handle, offset, chunk) futures via FuturesUnordered, topping up the pipeline as in-flight writes complete. Memory is bounded by max_inflight * write_len.
    • read_to_writer_pipelined<W: AsyncWrite>(writer, max_inflight) -> SftpResult<u64>: symmetric for downloads. Out-of-order READ responses are buffered by offset and flushed to writer once the next-expected chunk arrives.
  • src/ssh/tokio_client/file_transfer.rs: rewire upload_file/download_file/upload_dir_recursive/download_dir_recursive to use the new helpers with MAX_INFLIGHT_REQUESTS = 64.
  • ARCHITECTURE.md and docs/architecture/ssh-client.md: document bounded pipelined SFTP streaming.

Review follow-up fixes

A post-implementation review found and fixed several edge cases before merge:

  • Capped server-advertised read_len/write_len against local maxima, so a malicious or broken SFTP server cannot force huge client allocations through inflated limits.
  • Bounded the download reorder queue by counting both in-flight requests and pending out-of-order responses. This keeps worst-case memory bounded even when an early READ response is delayed and later chunks arrive first.
  • Used fstat size information when available to stop scheduling unnecessary reads past EOF and to treat unexpected short reads before the known file size as an error.
  • Kept the remote download handle shutdown behavior from main after resolving the conflict with the streaming transfer work merged in perf: stream SFTP uploads/downloads instead of buffering whole file #195.
  • Removed clippy-reported useless conversions in the pipelined error path.
  • Added SFTP crate tests covering advertised chunk-size capping and end-to-end in-memory pipelined upload/download behavior.

Security review

  • No authentication, host-key verification, shell execution, path expansion, or secret-handling behavior is changed.
  • The main security-relevant surface is untrusted SFTP server metadata. The final implementation caps advertised transfer chunk sizes before allocating buffers and keeps both upload and download working sets bounded by local constants plus max_inflight.
  • The reorder-buffer fix prevents a slow or adversarial server response ordering from causing unbounded client memory growth during downloads.

Performance review

  • Pipelining preserves the intended throughput improvement by allowing up to 64 concurrent SFTP READ/WRITE requests.
  • Memory remains bounded: upload is limited by in-flight write chunks; download is limited by in-flight plus pending ordered chunks.
  • The fstat-guided download path avoids speculative reads beyond the known remote file size when metadata is available.

Measured impact (macOS arm64 → bssh-server v2.1.3, loopback, 1 GiB)

op build real RSS
upload vanilla v2.1.3 39.30s 3.23 GB
upload streaming-only (#195) 3.47s 20 MB
upload streaming + pipelined 2.27s 49 MB
download vanilla v2.1.3 3.93s 2.17 GB
download streaming-only (#195) 3.41s 16 MB
download streaming + pipelined 1.34s 288 MB

Pipelining adds +53% upload throughput and +155% download throughput on top of the streaming patch. End-to-end vs. vanilla v2.1.3: upload 17× faster (27 → 451 MiB/s), download 2.9× faster (261 → 764 MiB/s). Peak RSS stays well below the unpatched levels even with 64 in-flight chunks.

Notes

Test plan

  • cargo fmt --check
  • cargo clippy -- -D warnings
  • cargo test -p bssh-russh-sftp
  • cargo test --lib --verbose (1222 passed, 9 ignored)
  • cargo test --tests --verbose -- --skip integration_test
  • GitHub Actions CI on commit 9ac61d29f92576dc31a573d24c0e7048e0213e8d
  • 1 GiB upload via patched client to vanilla bssh-server v2.1.3 verified file integrity (size + md5 match source)
  • 1 GiB download verified

The high-level `AsyncWrite`/`AsyncRead` impls on `File` issue exactly
one SFTP `WRITE`/`READ` at a time and `await` its `STATUS`/`DATA` reply
before sending the next. Sustained throughput is therefore bounded by
`chunk_size / RTT` — at 50 ms RTT with the default 256 KiB chunk that
caps a single transfer at ~5 MiB/s no matter how fast the link is.

Add two pipelined helpers on `File` that keep up to N SFTP requests in
flight concurrently, mirroring how OpenSSH's `sftp(1)` client behaves
(`-R 64` by default):

* `File::write_all_pipelined<R: AsyncRead>(reader, max_inflight)` —
  reads chunks from `reader` and dispatches `session.write(...)` futures
  via `FuturesUnordered`, refilling the pipeline as in-flight writes
  complete. Memory bounded by `max_inflight * write_len`.
* `File::read_to_writer_pipelined<W: AsyncWrite>(writer, max_inflight)` —
  symmetric for downloads. Out-of-order responses are buffered in a
  `BTreeMap` keyed by offset and flushed to `writer` as soon as the
  next-expected chunk arrives.

Wire `Client::upload_file`/`download_file`/`upload_dir_recursive`/
`download_dir_recursive` to use the new helpers with `MAX_INFLIGHT_REQUESTS = 64`.

Measured on macOS arm64 against `bssh-server` v2.1.3 on loopback with a
1 GiB file:

| op       | build                  | real    | RSS      |
|----------|------------------------|---------|----------|
| upload   | vanilla v2.1.3         | 39.30s  | 3.23 GB  |
| upload   | streaming-only         |  3.47s  |   20 MB  |
| upload   | streaming + pipelined  |  2.27s  |   49 MB  |
| download | vanilla v2.1.3         |  3.93s  | 2.17 GB  |
| download | streaming-only         |  3.41s  |   16 MB  |
| download | streaming + pipelined  |  1.34s  |  288 MB  |

Pipelining adds ~+53% on upload and ~+155% on download throughput on top
of the streaming patch (which already eliminated the whole-file load).
Peak RSS stays well below the unpatched levels: download holds at most
~`max_inflight` chunks pending in the reorder map, and upload caps at
`max_inflight * chunk_size + reader buffer`.
Yaminyam and others added 3 commits May 7, 2026 14:04
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…elining

# Conflicts:
#	src/ssh/tokio_client/file_transfer.rs
@inureyes inureyes merged commit 487301c into lablup:main May 10, 2026
2 checks passed
@inureyes inureyes self-assigned this May 10, 2026
inureyes pushed a commit that referenced this pull request May 10, 2026
The bssh-server hard-capped every SFTP `READ` reply at 64 KiB
(`MAX_READ_SIZE = 65536`) regardless of what the client requested.
`bssh-russh-sftp` and OpenSSH's `sftp-server` both use the SFTP standard
`MAX_READ_LENGTH = 261120` (255 KiB) for request sizing, so a client
asking for a 256 KiB chunk only ever got 64 KiB back, forcing it to
issue four extra requests for the same byte stream.

Bump `MAX_READ_SIZE` to `261120` so server replies match the standard
chunk size used by the rest of the stack.  Combined with client-side
pipelining (#196), this directly cuts the per-MiB request count on
downloads from 16 → 4.

Memory exposure stays bounded: handles are still capped at
`MAX_HANDLES = 1000` per session and each in-flight read still uses
a single per-request buffer of this size (max ~255 KiB × in-flight
requests).
inureyes added a commit that referenced this pull request May 10, 2026
SFTP transfer performance release. Bumps Cargo workspace version to 2.1.4 and refreshes README.md, CHANGELOG.md, debian/changelog, and the three man pages (bssh.1, bssh-server.8, bssh-keygen.1) to match.

Three perf changes ship since v2.1.3:

- Stream SFTP uploads/downloads in 255 KiB chunks instead of buffering whole files (#195) — peak RSS drops ~160x and uploads run ~11x faster on 1 GiB transfers, multi-GB transfers no longer OOM the client.
- Pipeline up to 64 concurrent SFTP requests for upload/download (#196), with server-advertised read/write lengths capped against local maxima and the download reorder queue bounded across both in-flight and pending out-of-order responses.
- Raise bssh-server SFTP MAX_READ_SIZE from 64 KiB to the 255 KiB SFTP standard (#197), cutting per-MiB request count on downloads from 16 to 4 when combined with client pipelining.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants