fix(cloud): persist user LoRAs to /data volume across worker resets by livepeer-tessa · Pull Request #926 · daydreamlive/scope

livepeer-tessa · 2026-04-12T18:36:02Z

Problem

User-installed LoRAs were written to /tmp/.daydream-scope/assets/lora/ which is ephemeral on fal.ai workers. When a worker was reset between jobs, /tmp/ was wiped, causing pipeline load failures for longlive and krea-realtime-video:

scope.server.pipeline_manager - ERROR - Failed to load pipeline longlive: 
LongLivePipeline.__init__: LoRA loading failed. File not found: 
/tmp/.daydream-scope/assets/lora/SUPERSUISH_LoRA_V1_000000750.safetensors.
Ensure the file exists in the models/lora/ directory.

6+ occurrences observed 2026-04-12 15:17–15:22 UTC across sessions aa6d9669 and others. The LoRA loaded fine on the first job, then failed on every subsequent worker reset.

Closes #923.

Fix

Introduce USER_LORA_DIR = /data/models/user-loras (persistent /data volume) and point DAYDREAM_SCOPE_LORA_DIR at it instead of /tmp. This is kept separate from DAYDREAM_SCOPE_LORA_SHARED_DIR (/data/models/lora) which holds pre-bundled sample LoRAs.

Session isolation is preserved: cleanup_session_data() and the /internal/cleanup-session endpoint both wipe USER_LORA_DIR at session end, preventing LoRA files from one user leaking to the next user on the same worker.

Changes

File	What changed
`fal_app.py`	Add `USER_LORA_DIR`, update `DAYDREAM_SCOPE_LORA_DIR` env, update `cleanup_session_data()` to also wipe user LoRAs
`livepeer_fal_app.py`	Add `USER_LORA_DIR`, update runner env
`livepeer_app.py`	Read `USER_LORA_DIR` from env, refactor cleanup helpers, clean user LoRAs on session end

Testing

All 46 existing test_workflow_resolve.py tests pass ✅
No changes to LoRA install/download logic — only where the directory lives

Add dimension validation in parse_lora_weights() so a LoRA trained for a different model size (e.g. Wan2.1-5B, in_features=5120) is rejected with a user-friendly ValueError when loaded into the 1.3B model (in_features=1536), rather than loading silently and crashing 150+ times at inference. Before: mat1/mat2 shape mismatch RuntimeError deep in peft/torch at inference After: ValueError at load time naming the layer, expected vs actual dims, and a plain-language hint about model architecture mismatch Also adds test_lora_dimension_validation.py covering: - compatible LoRA loads without error - 5B LoRA on 1.3B model raises ValueError - error message is user-friendly (names layer + dimensions) - out_features mismatch is also caught - 5B LoRA on 5B model is fine Signed-off-by: Tessa (livepeer-tessa) <tessa@livepeer.org>

…923) User-installed LoRAs were written to /tmp/.daydream-scope/assets/lora/ which is ephemeral on fal.ai workers. When a worker was reset between jobs the /tmp directory was wiped, causing pipeline load failures for longlive and krea-realtime-video with errors like: LoRA loading failed. File not found: /tmp/.daydream-scope/assets/lora/SUPERSUISH_LoRA_V1_000000750.safetensors Fix: introduce USER_LORA_DIR = /data/models/user-loras (persistent volume) and point DAYDREAM_SCOPE_LORA_DIR at it instead of /tmp. This is kept separate from DAYDREAM_SCOPE_LORA_SHARED_DIR (/data/models/lora) which holds pre-bundled sample LoRAs that must not be cleaned up between sessions. Session cleanup (cleanup_session_data / /internal/cleanup-session) is updated to wipe USER_LORA_DIR at session end so that one user's LoRAs cannot leak to the next user on the same worker. Affected files: - src/scope/cloud/fal_app.py — add USER_LORA_DIR, update env setup, update cleanup_session_data() - src/scope/cloud/livepeer_fal_app.py — add USER_LORA_DIR, update runner env - src/scope/cloud/livepeer_app.py — read USER_LORA_DIR from env, refactor cleanup helpers, clean loras on session end Closes #923 Signed-off-by: Tessa (livepeer-tessa) <tessa@livepeer.org>

coderabbitai · 2026-04-12T18:36:09Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2239c0bd-4f86-4038-9b5a-ca53b28a2b66

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/923-lora-persistence-across-worker-resets

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-12T19:02:43Z

🚀 fal.ai Preview Deployment


App ID	`daydream/scope-pr-926--preview`
WebSocket	`wss://fal.run/daydream/scope-pr-926--preview/ws`
Commit	`fa1a655`

Livepeer Runner


App ID	`daydream/scope-livepeer-pr-926--preview`
WebSocket	`wss://fal.run/daydream/scope-livepeer-pr-926--preview/ws`
Auth	`private`

Testing Livepeer Mode

SCOPE_CLOUD_MODE=livepeer SCOPE_CLOUD_APP_ID="daydream/scope-livepeer-pr-926--preview/ws" uv run daydream-scope

Tessa (livepeer-tessa) added 2 commits April 12, 2026 18:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cloud): persist user LoRAs to /data volume across worker resets#926

fix(cloud): persist user LoRAs to /data volume across worker resets#926
livepeer-tessa wants to merge 2 commits intomainfrom
fix/923-lora-persistence-across-worker-resets

livepeer-tessa commented Apr 12, 2026

Uh oh!

coderabbitai Bot commented Apr 12, 2026

Review skipped

Uh oh!

github-actions Bot commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

livepeer-tessa commented Apr 12, 2026

Problem

Fix

Changes

Testing

Uh oh!

coderabbitai Bot commented Apr 12, 2026

Review skipped

Uh oh!

github-actions Bot commented Apr 12, 2026

🚀 fal.ai Preview Deployment

Livepeer Runner

Testing Livepeer Mode

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant