Skip to content

[#11625][fix] handle list-typed eos_token_id in SamplingParams._setup#13178

Open
edenfunf wants to merge 2 commits intoNVIDIA:mainfrom
edenfunf:fix/sampling-params-list-eos-token-id
Open

[#11625][fix] handle list-typed eos_token_id in SamplingParams._setup#13178
edenfunf wants to merge 2 commits intoNVIDIA:mainfrom
edenfunf:fix/sampling-params-list-eos-token-id

Conversation

@edenfunf
Copy link
Copy Markdown
Contributor

@edenfunf edenfunf commented Apr 18, 2026

What

SamplingParams._setup() crashed trtllm-serve with std::bad_cast whenever
the tokenizer's eos_token_id is a list instead of a scalar int.

Before:

# sampling_params.py – _setup()
self.end_id = tokenizer.eos_token_id   # could be a list → C++ crash

After:

eos_token_id = tokenizer.eos_token_id
if isinstance(eos_token_id, list):
    self.end_id = eos_token_id[0] if eos_token_id else None
    # register remaining tokens so generation still halts on all of them
    for tok in eos_token_id[1:]:
        if tok not in (self.stop_token_ids or []):
            self.stop_token_ids = (self.stop_token_ids or []) + [tok]
else:
    self.end_id = eos_token_id

Why

HuggingFace tokenizers for models like Llama 3.1 expose
eos_token_id = [128001, 128009] (a List[int]).
_setup() blindly assigned this list to self.end_id, which is typed
Optional[int]. Downstream, base_worker.py passes end_id directly into
tllm.Request(end_id=...) — a C++ nanobind binding that expects int32_t.
PyBind/nanobind cannot cast a Python list to int32_t, so it raises
std::bad_cast, killing the server for every inference request on
any model whose tokenizer advertises multiple EOS tokens.

Why is this PR open if issue #11625 is closed?

Issue #11625 was closed by a maintainer who was unable to reproduce the
crash on their own hardware ("I'm closing this issue. Please feel free to open a new one if you need~"). It was not closed because a fix was merged.
The original reporter confirmed the crash persisted after the closure.
upstream/main still contains the unfixed assignment as of this PR.

The root cause is version-dependent: newer versions of transformers
(≥ 4.43) began returning List[int] from tokenizer.eos_token_id for
models whose tokenizer_config.json has a list-valued eos_token_id.
Earlier versions always returned a scalar, which is why the bug was not
universally reproducible.

How

  • Read tokenizer.eos_token_id into a local variable first.
  • If it is a list, use eos_token_id[0] as the scalar end_id and
    append eos_token_id[1:] to stop_token_ids so that generation
    terminates on all configured EOS tokens, not just the first one.
  • If it is already a scalar, behaviour is identical to before.
  • The existing generation_config.eos_token_id list-normalisation logic
    (lines 436-447) is untouched; de-duplication between the two paths is
    handled by the existing stop_token not in self.stop_token_ids guard.

Change is minimal and isolated to the 14 new lines inside _setup().
No other call sites, no architecture changes.

Test

Before fix (reproduction):

from tensorrt_llm.sampling_params import SamplingParams

class FakeTokenizer:
    eos_token_id = [128001, 128009]  # Llama 3.1 style
    pad_token_id = 0

params = SamplingParams()
params._setup(FakeTokenizer(), None, None)
# self.end_id is now [128001, 128009] → std::bad_cast when passed to C++

After fix: params.end_id == 128001 (int), params.stop_token_ids == [128009].

Regression tests added in tests/unittest/llmapi/test_sampling_params.py
(pure Python, no GPU or model weights required):

Test Covers
test_setup_scalar_eos_token_id scalar path unchanged
test_setup_list_eos_token_id_single_element single-element list → scalar
test_setup_list_eos_token_id_multiple_elements multi-element list, end_id is int
test_setup_list_eos_token_id_no_duplicate_stop_tokens no duplicate stop tokens
test_setup_end_id_already_set_ignores_tokenizer_eos explicit end_id not overwritten
test_setup_list_eos_token_id_pad_fallback pad_id fallback still an int

All pre-commit hooks passed (CRLF, ruff, ruff-format, codespell, DCO).

Summary by CodeRabbit

  • Bug Fixes

    • Improved handling of end-of-sequence tokens to support tokenizers with multiple EOS token IDs.
  • Tests

    • Added unit tests for SamplingParams initialization and token configuration behavior.

HuggingFace tokenizers for models like Llama 3.1 may return a list for
eos_token_id (e.g. [128001, 128009]).  SamplingParams._setup previously
assigned this list directly to self.end_id, which is typed Optional[int].
When the C++ binding (tllm.Request) received a list instead of an int it
raised std::bad_cast, crashing trtllm-serve for every inference request.

Fix: when eos_token_id is a list, use the first element as end_id (scalar)
and append the remaining tokens to stop_token_ids so that generation still
halts on all configured EOS tokens.

Fixes NVIDIA#11625

Signed-off-by: 許元豪 <146086744+edenfunf@users.noreply.github.com>
@edenfunf edenfunf changed the title [None][fix] handle list-typed eos_token_id in SamplingParams._setup [#11625][fix] handle list-typed eos_token_id in SamplingParams._setup Apr 18, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 18, 2026

📝 Walkthrough

Walkthrough

Modified SamplingParams._setup() to handle tokenizer.eos_token_id as either a single value or list. When a list is provided, end_id is set to the first element and remaining EOS tokens are appended to stop_token_ids, with duplicate prevention. Comprehensive unit tests validate the new behavior across six test scenarios.

Changes

Cohort / File(s) Summary
SamplingParams EOS Token Handling
tensorrt_llm/sampling_params.py
Modified _setup() to accept eos_token_id as scalar or list. Sets end_id to first element and appends remaining EOS tokens to stop_token_ids while preventing duplicates.
SamplingParams Unit Tests
tests/unittest/llmapi/test_sampling_params.py
New test suite with 6 test cases validating scalar/list eos_token_id handling, duplicate avoidance, explicit end_id preservation, and fallback pad_id behavior with a mock tokenizer.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The PR description thoroughly covers What, Why, and How with detailed context, root cause analysis, code examples, test coverage table, and confirms pre-commit compliance.
Title check ✅ Passed The title clearly and concisely summarizes the main change: handling list-typed eos_token_id in SamplingParams._setup(). It accurately reflects the core problem and solution.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unittest/llmapi/test_sampling_params.py`:
- Line 1: Update the copyright header string "Copyright 2025 NVIDIA CORPORATION
& AFFILIATES" in this new test file to the latest meaningful modification year
(change 2025 to 2026) so the file header matches the repository policy; locate
the header line at the top of tests/unittest/llmapi/test_sampling_params.py and
replace the year accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 33a2bb26-0d45-4d5b-b5be-c307ddef63ee

📥 Commits

Reviewing files that changed from the base of the PR and between 66d7711 and 49a4774.

📒 Files selected for processing (2)
  • tensorrt_llm/sampling_params.py
  • tests/unittest/llmapi/test_sampling_params.py

Comment thread tests/unittest/llmapi/test_sampling_params.py Outdated
Signed-off-by: 許元豪 <146086744+edenfunf@users.noreply.github.com>
@svc-trtllm-gh-bot svc-trtllm-gh-bot added the Community want to contribute PRs initiated from Community label Apr 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Community want to contribute PRs initiated from Community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants