Skip to content

perf: stream SFTP uploads/downloads instead of buffering whole file#195

Merged
inureyes merged 2 commits into
lablup:mainfrom
Yaminyam:feat/streaming-file-transfer
May 10, 2026
Merged

perf: stream SFTP uploads/downloads instead of buffering whole file#195
inureyes merged 2 commits into
lablup:mainfrom
Yaminyam:feat/streaming-file-transfer

Conversation

@Yaminyam
Copy link
Copy Markdown
Member

@Yaminyam Yaminyam commented May 6, 2026

Summary

Upload (upload_file, upload_dir_recursive) used tokio::fs::read to load the entire local file into a Vec<u8> before calling write_all, and download (download_file, download_dir_recursive) used read_to_end into a pooled buffer + clone() to a separate Vec before writing locally. For multi-GB transfers this means peak RSS scales with file size and large files OOM the client.

Replace each path with a small stream_copy() helper that loops on 255 KiB reads and writes through the existing AsyncRead/AsyncWrite implementations on tokio::fs::File and russh_sftp::client::fs::File. The chunk size matches the russh-sftp default MAX_WRITE_LENGTH, so each chunk stays within a single SFTP read/write packet without the 255 KiB + 1 KiB split that a 256 KiB buffer would trigger.

Review follow-up commit ec04b839 tightened the implementation by aligning the chunk size to the actual 255 KiB packet limit, explicitly closing remote SFTP file handles after successful downloads, adding focused chunk-boundary unit coverage, and documenting the streaming transfer behavior in the architecture docs.

Measured impact

Verified locally on macOS arm64 against bssh-server v2.1.3 over loopback with a 1 GiB file:

Op Build real RSS
upload unpatched 38.65s 3.23 GB
upload streaming 3.47s 20 MB
download unpatched 3.93s 2.17 GB
download streaming 3.41s 16 MB

Peak RSS drops ~160x and uploads complete ~11x faster (a single multi-MB write_all apparently serializes much worse through the SFTP pipeline than chunked writes).

Review result

  • Correctness: The upload/download single-file and recursive directory paths now stream instead of buffering whole files, preserving existing error propagation and flush behavior while explicitly closing downloaded remote handles.
  • Security: No authentication, host-key verification, command execution, credential handling, or logging behavior is changed. No new credential exposure, command injection, path expansion, or trust-boundary issue was found in the PR-scoped changes.
  • Performance: The transfer payload memory is bounded to one 255 KiB buffer per active file copy, avoiding file-size-scaled RSS and avoiding accidental extra SFTP write fragmentation from the previous 256 KiB review version.

Test plan

  • cargo check clean
  • cargo build --release clean
  • 1 GiB upload via patched client to bssh-server v2.1.3 loopback verified file integrity (md5 matches source)
  • 1 GiB download verified
  • cargo test --no-run
  • cargo test ssh::tokio_client::file_transfer::tests::stream_copy_writes_sftp_sized_chunks
  • cargo fmt --check
  • cargo clippy -- -D warnings
  • cargo test --lib --verbose
  • cargo test --tests --verbose -- --skip integration_test
  • GitHub CI on ec04b839

Upload (`upload_file`, `upload_dir_recursive`) used `tokio::fs::read` to
load the entire local file into a `Vec<u8>` before calling `write_all`,
and download (`download_file`, `download_dir_recursive`) used
`read_to_end` into a pooled buffer + `clone()` to a separate `Vec`
before writing locally. For multi-GB transfers this means peak RSS
scales with file size and large files OOM the client.

Replace each path with a small `stream_copy()` helper that loops on
256 KiB reads and writes through the existing `AsyncRead`/`AsyncWrite`
implementations on `tokio::fs::File` and `russh_sftp::client::fs::File`.
Buffer size matches the SFTP MAX_WRITE_LENGTH so each chunk maps to a
single SFTP packet without further fragmentation.

Verified locally on macOS arm64 against `bssh-server` v2.1.3 over
loopback with a 1 GiB file:

| Op       | Build      | real    | RSS      |
|----------|------------|---------|----------|
| upload   | unpatched  | 38.65s  | 3.23 GB  |
| upload   | streaming  |  3.47s  |   20 MB  |
| download | unpatched  |  3.93s  | 2.17 GB  |
| download | streaming  |  3.41s  |   16 MB  |

Peak RSS drops ~160x and uploads complete ~11x faster (a single
multi-MB `write_all` apparently serializes much worse through the SFTP
pipeline than 256 KiB chunked writes).
Match the streaming buffer to russh-sftp's 255 KiB packet limit so each chunk stays within one SFTP read/write request, explicitly close downloaded remote handles after successful copies, and cover the chunking behavior with a focused unit test.
@inureyes
Copy link
Copy Markdown
Member

Summary

Upload (upload_file, upload_dir_recursive) used tokio::fs::read to load the entire local file into a Vec<u8> before calling write_all, and download (download_file, download_dir_recursive) used read_to_end into a pooled buffer + clone() to a separate Vec before writing locally. For multi-GB transfers this means peak RSS scales with file size and large files OOM the client.

Replace each path with a small stream_copy() helper that loops on 255 KiB reads and writes through the existing AsyncRead/AsyncWrite implementations on tokio::fs::File and russh_sftp::client::fs::File. The chunk size matches the russh-sftp default MAX_WRITE_LENGTH, so each chunk stays within a single SFTP read/write packet without the 255 KiB + 1 KiB split that a 256 KiB buffer would trigger.

Review follow-up commit ec04b839 tightened the implementation by aligning the chunk size to the actual 255 KiB packet limit, explicitly closing remote SFTP file handles after successful downloads, adding focused chunk-boundary unit coverage, and documenting the streaming transfer behavior in the architecture docs.

Measured impact

Verified locally on macOS arm64 against bssh-server v2.1.3 over loopback with a 1 GiB file:

Op Build real RSS
upload unpatched 38.65s 3.23 GB
upload streaming 3.47s 20 MB
download unpatched 3.93s 2.17 GB
download streaming 3.41s 16 MB

Peak RSS drops ~160x and uploads complete ~11x faster (a single multi-MB write_all apparently serializes much worse through the SFTP pipeline than chunked writes).

Review result

  • Correctness: The upload/download single-file and recursive directory paths now stream instead of buffering whole files, preserving existing error propagation and flush behavior while explicitly closing downloaded remote handles.
  • Security: No authentication, host-key verification, command execution, credential handling, or logging behavior is changed. No new credential exposure, command injection, path expansion, or trust-boundary issue was found in the PR-scoped changes.
  • Performance: The transfer payload memory is bounded to one 255 KiB buffer per active file copy, avoiding file-size-scaled RSS and avoiding accidental extra SFTP write fragmentation from the previous 256 KiB review version.

Test plan

  • cargo check clean
  • cargo build --release clean
  • 1 GiB upload via patched client to bssh-server v2.1.3 loopback verified file integrity (md5 matches source)
  • 1 GiB download verified
  • cargo test --no-run
  • cargo test ssh::tokio_client::file_transfer::tests::stream_copy_writes_sftp_sized_chunks
  • cargo fmt --check
  • cargo clippy -- -D warnings
  • cargo test --lib --verbose
  • cargo test --tests --verbose -- --skip integration_test
  • GitHub CI on ec04b839

@inureyes inureyes merged commit 0a9f647 into lablup:main May 10, 2026
2 checks passed
@inureyes inureyes self-requested a review May 10, 2026 09:33
inureyes added a commit that referenced this pull request May 10, 2026
SFTP transfer performance release. Bumps Cargo workspace version to 2.1.4 and refreshes README.md, CHANGELOG.md, debian/changelog, and the three man pages (bssh.1, bssh-server.8, bssh-keygen.1) to match.

Three perf changes ship since v2.1.3:

- Stream SFTP uploads/downloads in 255 KiB chunks instead of buffering whole files (#195) — peak RSS drops ~160x and uploads run ~11x faster on 1 GiB transfers, multi-GB transfers no longer OOM the client.
- Pipeline up to 64 concurrent SFTP requests for upload/download (#196), with server-advertised read/write lengths capped against local maxima and the download reorder queue bounded across both in-flight and pending out-of-order responses.
- Raise bssh-server SFTP MAX_READ_SIZE from 64 KiB to the 255 KiB SFTP standard (#197), cutting per-MiB request count on downloads from 16 to 4 when combined with client pipelining.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants