[None][fix] Do not leak KV cache quantization into vision encoder by 2ez4bz · Pull Request #13181 · NVIDIA/TensorRT-LLM

2ez4bz · 2026-04-19T03:21:18Z

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved quantization configuration handling in RADIO vision models to prevent conflicts between parent and sub-component settings when quantization is disabled, ensuring correct initialization and stable inference behavior.
Tests
- Added comprehensive unit tests for RADIO vision model quantization scenarios to enhance test coverage, validation reliability, and prevent potential future regressions in quantization behavior.

Description

Why?

We were erroneously passing the KV cache quant config
from the LLM into the vision encoder for nemotron models.

What?

This commit fixes that, and adds a regression test for it.

Test Coverage

New test file added to catch the error that this commit fixes (verified to fail without it).

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

2ez4bz · 2026-04-19T03:21:40Z

/bot run

coderabbitai · 2026-04-19T03:25:40Z

📝 Walkthrough

Walkthrough

This change modifies RADIO vision model quantization handling to clear KV-cache quantization settings when disable_quantization is True, preventing conflicts with vision encoders lacking KV cache managers. A unit test validates the corrected behavior under FP8 settings.

Changes

Cohort / File(s)	Summary
Vision Model Quantization Fix `tensorrt_llm/_torch/models/modeling_radio.py`	Modified `RADIOVisionModel.__init__` to replace existing `quant_config` with blank `QuantConfig()` when `disable_quantization=True`, instead of retaining `kv_cache_quant_algo` settings that could conflict with the vision encoder's architecture.
Testing Infrastructure `tests/integration/test_lists/test-db/l0_a10.yml`, `tests/unittest/_torch/modeling/test_modeling_radio.py`	Added new CUDA-only unit test for RADIO vision model that validates forward pass execution with `disable_quantization=True` under FP8 KV-cache settings. Test constructs minimal ViT configuration and verifies output tensor dimensions. Test selection list updated to include new test module in pytorch pre-merge pipeline.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly and specifically describes the main fix: preventing KV cache quantization from being incorrectly applied to the vision encoder.
Description check	✅ Passed	The PR description includes all required sections: rationale (Why), solution summary (What), test coverage details, and a completed PR checklist matching the repository template.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tests/unittest/_torch/modeling/test_modeling_radio.py (1)

82-95: Consider asserting quant config is sanitized before forward.

A direct assertion makes the regression intent explicit and catches config leakage earlier than backend execution failures.

Proposed test hardening

 def test_radio_fp8_parent_kv_cache_does_not_leak_into_vit(tiny_vit_config):
@@
     vision_model = RADIOVisionModel(_make_fp8_model_config(), disable_quantization=True)
+    assert vision_model.model_config.quant_config is not None
+    assert vision_model.model_config.quant_config.quant_algo is None
+    assert vision_model.model_config.quant_config.kv_cache_quant_algo is None
@@
     with torch.inference_mode():
         features = vision_model.forward(pixel_values)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/modeling/test_modeling_radio.py` around lines 82 - 95,
The test should explicitly assert that the model's quantization config was
sanitized/disabled before calling forward: after creating vision_model via
RADIOVisionModel(_make_fp8_model_config(), disable_quantization=True) add an
assertion that the quant/config state reflects disabling (for example assert
vision_model.disable_quantization is True or assert getattr(vision_model,
"quantization_config", None) is None) so the regression intent is explicit and
any config leakage is caught prior to calling vision_model.forward.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/unittest/_torch/modeling/test_modeling_radio.py`:
- Around line 82-95: The test should explicitly assert that the model's
quantization config was sanitized/disabled before calling forward: after
creating vision_model via RADIOVisionModel(_make_fp8_model_config(),
disable_quantization=True) add an assertion that the quant/config state reflects
disabling (for example assert vision_model.disable_quantization is True or
assert getattr(vision_model, "quantization_config", None) is None) so the
regression intent is explicit and any config leakage is caught prior to calling
vision_model.forward.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9cf1ca88-f436-4572-85e1-d22760bd6142

📥 Commits

Reviewing files that changed from the base of the PR and between f85e3a3 and 3c2a3cb.

📒 Files selected for processing (3)

tensorrt_llm/_torch/models/modeling_radio.py
tests/integration/test_lists/test-db/l0_a10.yml
tests/unittest/_torch/modeling/test_modeling_radio.py

tensorrt-cicd · 2026-04-19T03:28:06Z

PR_Github #44153 [ run ] triggered by Bot. Commit: 3c2a3cb Link to invocation

tensorrt-cicd · 2026-04-19T07:00:13Z

PR_Github #44153 [ run ] completed with state SUCCESS. Commit: 3c2a3cb
/LLM/main/L0_MergeRequest_PR pipeline #34580 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

tijyojwad · 2026-04-19T16:58:57Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-19T17:04:28Z

PR_Github #44207 [ run ] triggered by Bot. Commit: 3c2a3cb Link to invocation

ZhanruiSunCh · 2026-04-19T20:58:48Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-19T21:05:46Z

PR_Github #44225 [ run ] triggered by Bot. Commit: 3c2a3cb Link to invocation

tensorrt-cicd · 2026-04-20T01:37:12Z

PR_Github #44225 [ run ] completed with state SUCCESS. Commit: 3c2a3cb
/LLM/main/L0_MergeRequest_PR pipeline #34648 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

2ez4bz · 2026-04-20T17:48:04Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-20T18:00:14Z

PR_Github #44491 [ run ] triggered by Bot. Commit: 5643300 Link to invocation

tensorrt-cicd · 2026-04-20T22:34:47Z

PR_Github #44491 [ run ] completed with state SUCCESS. Commit: 5643300
/LLM/main/L0_MergeRequest_PR pipeline #34892 completed with status: 'SUCCESS'

CI Report

Link to invocation

2ez4bz requested review from a team as code owners April 19, 2026 03:21

2ez4bz requested review from moraxu, rakib-hasan, shaharmor98, syuoni and tijyojwad April 19, 2026 03:21

github-actions bot assigned 2ez4bz Apr 19, 2026

coderabbitai bot reviewed Apr 19, 2026

View reviewed changes

moraxu approved these changes Apr 19, 2026

View reviewed changes

2ez4bz enabled auto-merge (squash) April 19, 2026 05:28

tijyojwad approved these changes Apr 19, 2026

View reviewed changes

yechank-nvidia approved these changes Apr 20, 2026

View reviewed changes

[None][fix] Do not leak KV cache quantization into vision encoder

5643300

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

2ez4bz force-pushed the dev-nano-v3-fp8-kv-cache-fix branch from 3c2a3cb to 5643300 Compare April 20, 2026 17:47

2ez4bz merged commit 30a65b7 into NVIDIA:main Apr 20, 2026
5 checks passed

Conversation

2ez4bz commented Apr 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

2ez4bz commented Apr 19, 2026

Uh oh!

coderabbitai bot commented Apr 19, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Apr 19, 2026

Uh oh!

tensorrt-cicd commented Apr 19, 2026

Uh oh!

tijyojwad commented Apr 19, 2026

Uh oh!

tensorrt-cicd commented Apr 19, 2026

Uh oh!

ZhanruiSunCh commented Apr 19, 2026

Uh oh!

tensorrt-cicd commented Apr 19, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

2ez4bz commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

2ez4bz commented Apr 19, 2026 •

edited by coderabbitai bot

Loading