fix: harden retool rollout against multi-turn / retry desync by leofan-lab · Pull Request #1861 · THUDM/slime

leofan-lab · 2026-04-24T17:05:35Z

Summary

Four small fixes to examples/retool/generate_with_retool.py that protect one invariant across multi-turn tool-calling rollouts: sample.rollout_log_probs, loss_masks, response_token_ids, and response must stay the same length. Any of these getting out of sync crashes slice_log_prob_with_cp in the trainer with a confusing length-mismatch error.

When this fires

Hit during async retool training runs once samples started cycling through the rollout manager's retry path (aborted → re-enqueued) and once tool outputs got large enough to push past the context cap. Four separate ways the invariant broke in practice; four small fixes.

Fixes

Reset stale sample state at entry. Retried samples arrive with rollout_log_probs / response / loss_mask still populated from the prior attempt; the main loop appends onto the stale list and desyncs from the fresh response_token_ids. Clear all four fields up front.
Clamp per-turn max_new_tokens to the remaining context budget. A turn can otherwise append up to rollout_max_response_len tokens on top of a total already near max_context_length, producing samples larger than the training-side per-partition cap.
Abort when sglang returns text without output_token_logprobs. The old fallback retokenized the text, which grows response_token_ids without a matching log-probs entry. Return ABORTED so the rollout manager re-queues the group cleanly instead of poisoning the trainer.
Trim post-tool-output overflow. Fix [rollout] feat: implement partial rollout feature on rollout engine side #2 clamps the model's generation, but tool output (e.g. a large print() from code_interpreter) is appended unconstrained. Trim response_token_ids / loss_masks / rollout_log_probs together to keep lengths aligned, re-decode response so the text field matches, and mark the sample TRUNCATED.

Risk

Blast radius: low — file is under examples/, consumers opt in via --custom-generate-function-path.
Behavior change: fix fix typos in example docs #3 is the notable one. Pre-patch, a sample with missing logprobs would silently desync the trainer (eventually crash, but far from the cause). Post-patch, the same condition aborts and re-queues. Operators should treat an uptick in ABORTED samples here as a signal to check sglang version / --return-logprob routing.

Tests

None added. examples/retool/ has no unit-test coverage in the repo today; a full integration test would require mocking sglang + the tool sandbox. Each fix's failure mode is described concretely in the commit message and inline comments, and is reproducible by hand with a 0.5B model + retool yaml + a deliberately noisy tool.

Keeps sample.rollout_log_probs, loss_masks, response_token_ids, and response length-aligned across turns. Four fixes: 1. Reset stale sample state at entry so a retried (previously aborted) sample doesn't concat new tokens onto old log-probs. 2. Clamp per-turn max_new_tokens to the remaining context budget so total_length can't blow past max_context_length, producing samples larger than the training-side per-partition cap. 3. Abort when sglang returns text without output_token_logprobs. The old fallback retokenized, which grows response_token_ids without matching log-probs — abort lets the rollout manager re-queue cleanly. 4. Trim post-tool-output overflow. Fix THUDM#2 clamps the model's generation, but tool output is appended unconstrained. Trim tokens / loss_masks / log-probs together and mark TRUNCATED. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

zhuzilin merged commit 41dc3b6 into THUDM:main May 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: harden retool rollout against multi-turn / retry desync#1861

fix: harden retool rollout against multi-turn / retry desync#1861
zhuzilin merged 1 commit intoTHUDM:mainfrom
leofan-lab:fix/retool-multi-turn-safety

leofan-lab commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leofan-lab commented Apr 24, 2026

Summary

When this fires

Fixes

Risk

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants