fix: filter single-file safetensors by assigned layers before push#83
Open
cjchanh wants to merge 1 commit intoevilsocket:mainfrom
Open
fix: filter single-file safetensors by assigned layers before push#83cjchanh wants to merge 1 commit intoevilsocket:mainfrom
cjchanh wants to merge 1 commit intoevilsocket:mainfrom
Conversation
When a worker is assigned a subset of layers from a single-file safetensors model, extract only the needed tensors instead of pushing the entire file. For Qwen2.5-7B-4bit (4 GiB), a 2-layer iPad worker now receives 250 MiB instead of 4 GiB — staying well under the 3 GiB iOS jetsam limit. The indexed model path already filtered correctly via weight_map. This extends the same extraction to the single-file fallback by: - Reading the safetensors header to enumerate tensor names - Filtering by assigned layer prefixes - Calling extract_layer_tensors to build a minimal blob - Falling back to full push when layers is empty (backward compat) Verified: M5 master + iPad Air M3 worker, 2 layers, 250.1 MiB push, 1.4 GiB RSS, coherent output at 17.21 tok/s.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When a Cake master distributes a single-file safetensors model to a worker, it pushes the entire file regardless of how many layers the worker is assigned. For
Qwen2.5-7B-Instruct-4bit(4 GiB single file), an iPad worker with a 3 GiB jetsam budget receives the full 4 GiB, exceeds memory, and crashes withearly eof.The indexed model path (
model.safetensors.index.jsonpresent) already filters correctly viaweight_map. The single-file fallback atsharding/mod.rsunconditionally addsmodel.safetensorsto the push list.Fix
For single-file models with assigned layers, the push path now:
starts_withlogic as the indexed path)extract_layer_tensorsto build a minimal safetensors blob containing only the needed tensorsBackward compatible: if
layersis empty (no specific assignment), the full file is still pushed. If no tensors match assigned layers, falls back to full push with a warning.Results
Tested with M5 Max master + iPad Air M3 worker,
Qwen2.5-7B-Instruct-4bit:early eof)Test plan
cargo test -p cake-core --lib— 641 tests pass (638 existing + 3 new)cargo test -p cake-core --test unit— 235 tests passcargo clippy— zero new warningsNew unit tests
extract_layer_tensors_single_file_filters_correctly— 4 tensors in, request 2, verify only 2 in output with correct data bytesextract_layer_tensors_single_file_all_layers— request all tensors, verify all present with correct total sizeextract_layer_tensors_single_file_missing_tensor_errors— request nonexistent tensor, verify error