Skip to content

fix(lora): validate LoRA dimensions against model at load time (#922)#925

Open
livepeer-tessa wants to merge 1 commit intomainfrom
fix/lora-dimension-validation-922
Open

fix(lora): validate LoRA dimensions against model at load time (#922)#925
livepeer-tessa wants to merge 1 commit intomainfrom
fix/lora-dimension-validation-922

Conversation

@livepeer-tessa
Copy link
Copy Markdown
Contributor

Problem

Closes #922.

When a LoRA trained for Wan2.1-5B (in_features=5120) is loaded into the Wan2.1-1.3B model (in_features=1536), peft_lora.py successfully loads the adapter (layer names match) but then fails at inference with:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (768x1536 and 5120x32)

This produced 156 identical errors in session 067a55be (14:41–14:55 UTC today) while consuming GPU resources the entire time — with no user-visible error message.

Root Cause

parse_lora_weights() matched LoRA layer names to model parameters without checking that the weight shapes were compatible. The mismatch (lora_A.shape[1] != model_weight.shape[1]) only surfaced later during the torch.mm call inside peft/tuners/lora/layer.py.

Fix

Add dimension validation inside parse_lora_weights() immediately after a match is found. If the LoRA's in_features or out_features don't match the model layer's weight shape, raise a ValueError with a user-readable message naming the layer, the LoRA dimensions, the model dimensions, and a plain-language hint about model architecture mismatch.

Before:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (768x1536 and 5120x32)
  File peft/tuners/lora/layer.py, line 807, in forward
  ... (at inference, 156 times)

After:

ValueError: LoRA dimension mismatch at layer 'blocks.0.self_attn.q':
LoRA expects (256x5120) but model layer is (256x1536).
This LoRA was likely trained for a different model size (e.g. Wan2.1-5B vs 1.3B).
Please use a LoRA that matches the loaded model architecture.
  (raised at load time, before inference starts)

The ValueError propagates through PeftLoRAStrategy.load_adapters_from_list (caught → RuntimeError: LoRA loading failed) and through PermanentMergeLoRAStrategy the same way.

Tests

5 new unit tests in tests/test_lora_dimension_validation.py:

  • Compatible LoRA loads without error
  • 5B LoRA on 1.3B model raises ValueError
  • Error message names the layer and both dimension sets
  • Out-features mismatch is also caught
  • 5B LoRA on 5B model is fine

All 5 pass locally.

Add dimension validation in parse_lora_weights() so a LoRA trained for a
different model size (e.g. Wan2.1-5B, in_features=5120) is rejected with a
user-friendly ValueError when loaded into the 1.3B model (in_features=1536),
rather than loading silently and crashing 150+ times at inference.

Before: mat1/mat2 shape mismatch RuntimeError deep in peft/torch at inference
After:  ValueError at load time naming the layer, expected vs actual dims, and
        a plain-language hint about model architecture mismatch

Also adds test_lora_dimension_validation.py covering:
- compatible LoRA loads without error
- 5B LoRA on 1.3B model raises ValueError
- error message is user-friendly (names layer + dimensions)
- out_features mismatch is also caught
- 5B LoRA on 5B model is fine

Signed-off-by: Tessa (livepeer-tessa) <tessa@livepeer.org>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: eda5b514-e4e3-46f0-9c1b-c4b80e111a38

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/lora-dimension-validation-922

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

🚀 fal.ai Preview Deployment

App ID daydream/scope-pr-925--preview
WebSocket wss://fal.run/daydream/scope-pr-925--preview/ws
Commit f7917d8

Livepeer Runner

App ID daydream/scope-livepeer-pr-925--preview
WebSocket wss://fal.run/daydream/scope-livepeer-pr-925--preview/ws
Auth private

Testing Livepeer Mode

SCOPE_CLOUD_MODE=livepeer SCOPE_CLOUD_APP_ID="daydream/scope-livepeer-pr-925--preview/ws" uv run daydream-scope

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[fal.ai] longlive/PEFT LoRA: mat1/mat2 shape mismatch (768x1536 vs 5120x32) during inference — LoRA rank incompatibility at qkv linear layer

1 participant