fix(fast-llm): stop producers and helpers on training completion by jlamypoirier · Pull Request #144 · ServiceNow/PipelineRL

jlamypoirier · 2026-05-29T16:43:32Z

gpt-5 Codex note:

This is a fixup for PR #142. It extends the Fast-LLM early-stop fix beyond the actor path.

Changes:

The preprocessor now uses the explicit Fast-LLM training_finished event before stopping, instead of the legacy sample-counting heuristic.
The launcher now stops remaining supervised helper processes after the finetune process has exited cleanly and training completion has been observed. This covers redis/actor/preprocessor-style helpers, not only inference servers.

Why:

In Fast-LLM mode, samples_processed reflects Redis entries read, not natural optimizer completion, so preprocessing could still stop converting actor rollouts into fast_llm_streaming documents too early.
After a clean Fast-LLM completion, non-inference helper processes can otherwise keep the launcher alive even though the trainer has finished.

Verification from the local branch:

/Users/joel.lamy-poirier/Projects/Fast-LLM/venv/bin/python -m py_compile pipelinerl/launch.py pipelinerl/preprocess.py tests/test_launch_process_monitoring.py
git diff --check
Targeted pytest collection is blocked in this local environment because tests/conftest.py imports pipelinerl.vllm1, which requires uvloop.

fix(preprocess): use training_done under Fast-LLM

27724b4

jlamypoirier changed the base branch from fast-llm to fix/actor-finished-uses-training-done-for-fast-llm May 29, 2026 16:56

fix(launch): stop helpers after training completion

33af04e

jlamypoirier changed the title ~~fix(preprocess): use training_done under Fast-LLM~~ fix(fast-llm): stop producers and helpers on training completion May 29, 2026

jlamypoirier deleted the branch ServiceNow:fast-llm June 5, 2026 19:07

jlamypoirier closed this Jun 5, 2026

jlamypoirier reopened this Jun 5, 2026

jlamypoirier changed the base branch from fix/actor-finished-uses-training-done-for-fast-llm to fast-llm June 5, 2026 19:09

jlamypoirier merged commit add17c7 into ServiceNow:fast-llm Jun 5, 2026

jlamypoirier deleted the fix/actor-finished-uses-training-done-for-fast-llm branch June 5, 2026 19:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(fast-llm): stop producers and helpers on training completion#144

fix(fast-llm): stop producers and helpers on training completion#144
jlamypoirier merged 2 commits into
ServiceNow:fast-llmfrom
jlamypoirier:fix/actor-finished-uses-training-done-for-fast-llm

jlamypoirier commented May 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jlamypoirier commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jlamypoirier commented May 29, 2026 •

edited

Loading