Skip to content

fix: harden retool rollout against multi-turn / retry desync#1861

Merged
zhuzilin merged 1 commit intoTHUDM:mainfrom
leofan-lab:fix/retool-multi-turn-safety
May 11, 2026
Merged

fix: harden retool rollout against multi-turn / retry desync#1861
zhuzilin merged 1 commit intoTHUDM:mainfrom
leofan-lab:fix/retool-multi-turn-safety

Conversation

@leofan-lab
Copy link
Copy Markdown
Contributor

Summary

Four small fixes to examples/retool/generate_with_retool.py that protect one invariant across multi-turn tool-calling rollouts: sample.rollout_log_probs, loss_masks, response_token_ids, and response must stay the same length. Any of these getting out of sync crashes slice_log_prob_with_cp in the trainer with a confusing length-mismatch error.

When this fires

Hit during async retool training runs once samples started cycling through the rollout manager's retry path (aborted → re-enqueued) and once tool outputs got large enough to push past the context cap. Four separate ways the invariant broke in practice; four small fixes.

Fixes

  1. Reset stale sample state at entry. Retried samples arrive with rollout_log_probs / response / loss_mask still populated from the prior attempt; the main loop appends onto the stale list and desyncs from the fresh response_token_ids. Clear all four fields up front.

  2. Clamp per-turn max_new_tokens to the remaining context budget. A turn can otherwise append up to rollout_max_response_len tokens on top of a total already near max_context_length, producing samples larger than the training-side per-partition cap.

  3. Abort when sglang returns text without output_token_logprobs. The old fallback retokenized the text, which grows response_token_ids without a matching log-probs entry. Return ABORTED so the rollout manager re-queues the group cleanly instead of poisoning the trainer.

  4. Trim post-tool-output overflow. Fix [rollout] feat: implement partial rollout feature on rollout engine side #2 clamps the model's generation, but tool output (e.g. a large print() from code_interpreter) is appended unconstrained. Trim response_token_ids / loss_masks / rollout_log_probs together to keep lengths aligned, re-decode response so the text field matches, and mark the sample TRUNCATED.

Risk

  • Blast radius: low — file is under examples/, consumers opt in via --custom-generate-function-path.
  • Behavior change: fix fix typos in example docs #3 is the notable one. Pre-patch, a sample with missing logprobs would silently desync the trainer (eventually crash, but far from the cause). Post-patch, the same condition aborts and re-queues. Operators should treat an uptick in ABORTED samples here as a signal to check sglang version / --return-logprob routing.

Tests

None added. examples/retool/ has no unit-test coverage in the repo today; a full integration test would require mocking sglang + the tool sandbox. Each fix's failure mode is described concretely in the commit message and inline comments, and is reproducible by hand with a 0.5B model + retool yaml + a deliberately noisy tool.

Keeps sample.rollout_log_probs, loss_masks, response_token_ids, and response
length-aligned across turns. Four fixes:

1. Reset stale sample state at entry so a retried (previously aborted)
   sample doesn't concat new tokens onto old log-probs.

2. Clamp per-turn max_new_tokens to the remaining context budget so
   total_length can't blow past max_context_length, producing samples
   larger than the training-side per-partition cap.

3. Abort when sglang returns text without output_token_logprobs. The old
   fallback retokenized, which grows response_token_ids without matching
   log-probs — abort lets the rollout manager re-queue cleanly.

4. Trim post-tool-output overflow. Fix THUDM#2 clamps the model's generation,
   but tool output is appended unconstrained. Trim tokens / loss_masks /
   log-probs together and mark TRUNCATED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zhuzilin zhuzilin merged commit 41dc3b6 into THUDM:main May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants