Skip to content

Pass CWL ToolTimeLimit.timelimit to slurm job submission#5502

Draft
lonbar wants to merge 11 commits into
DataBiosphere:masterfrom
lonbar:issues/featurebranch
Draft

Pass CWL ToolTimeLimit.timelimit to slurm job submission#5502
lonbar wants to merge 11 commits into
DataBiosphere:masterfrom
lonbar:issues/featurebranch

Conversation

@lonbar
Copy link
Copy Markdown

@lonbar lonbar commented Apr 23, 2026

Provides an initial implementation for the option to pass runtimes to slurm's resource allocation. Addresss #3037.

Changelog Entry

To be copied to the draft changelog by merger:

  • Runtimes for slurm jobs can now be set using --defaultWalltime
  • Time limits from CWL's ToolTimeLimit are used in slurm batch submissions

Reviewer Checklist

  • Make sure it is coming from issues/XXXX-fix-the-thing in the Toil repo, or from an external repo.
    • If it is coming from an external repo, make sure to pull it in for CI with:
      contrib/admin/test-pr otheruser theirbranchname issues/XXXX-fix-the-thing
      
    • If there is no associated issue, create one.
  • Read through the code changes. Make sure that it doesn't have:
    • Addition of trailing whitespace.
    • New variable or member names in camelCase that want to be in snake_case.
    • New functions without type hints.
    • New functions or classes without informative docstrings.
    • Changes to semantics not reflected in the relevant docstrings.
    • New or changed command line options for Toil workflows that are not reflected in docs/running/{cliOptions,cwl,wdl}.rst
    • New features without tests.
  • Comment on the lines of code where problems exist with a review comment. You can shift-click the line numbers in the diff to select multiple lines.
  • Finish the review with an overall description of your opinion.

Merger Checklist

  • Make sure the PR passed tests, including the Gitlab tests, for the most recent commit in its branch.
  • Make sure the PR has been reviewed. If not, review it. If it has been reviewed and any requested changes seem to have been addressed, proceed.
  • Merge with the Github "Squash and merge" feature.
    • If there are multiple authors' commits, add Co-authored-by to give credit to all contributing authors.
  • Copy its recommended changelog entry to the Draft Changelog.
  • Append the issue number in parentheses to the changelog entry.

@mr-c
Copy link
Copy Markdown
Contributor

mr-c commented Apr 24, 2026

@adamnovak CI is running using https://github.com/DataBiosphere/toil/tree/issues/3037-wallclock-slurm (a.k.a contrib/admin/test-pr lonbar issues/featurebranch issues/3037-wallclock-slurm)

Comment thread src/toil/cwl/cwltoil.py Outdated
Comment thread src/toil/job.py Outdated
@adamnovak
Copy link
Copy Markdown
Member

Oh excellent, we have wanted this for a while.

@lonbar
Copy link
Copy Markdown
Author

lonbar commented Apr 27, 2026

@mr-c I have been made aware of --maxJobDuration, which allows toil itself to cancel jobs after a specified time. I was wondering if it makes sense to use this input instead of adding --defaultWalltime.

My thinking is that it might make sense to keep them separate, as currently the logic is:

  1. If ToolTimeLimit.timelimit is set, use that value.
  2. If ToolTimeLimit.timelimit is not set but --defaultWalltime is used, use the value of --defaultWalltime.
  3. Otherwise, do not set a time limit for slurm jobs.

This means that --defaultWalltime provides a baseline wall time for steps that do not specify a one and if, say, a step in a CWL sets ToolTimeLimit.timelimit: 0, the resulting slurm submission will not contain a wall time argument even if --defaultWalltime is set. --maxJobDuration allows a user to put an upper bound to the runtime of such jobs. Happy to hear your thoughts.

@mr-c
Copy link
Copy Markdown
Contributor

mr-c commented Apr 27, 2026

@mr-c I have been made aware of --maxJobDuration, which allows toil itself to cancel jobs after a specified time. I was wondering if it makes sense to use this input instead of adding --defaultWalltime.

My thinking is that it might make sense to keep them separate, as currently the logic is:

1. If `ToolTimeLimit.timelimit` is set, use that value.

2. If `ToolTimeLimit.timelimit` is not set but `--defaultWalltime` is used, use the value of `--defaultWalltime`.

3. Otherwise, do not set a time limit for slurm jobs.

This means that --defaultWalltime provides a baseline wall time for steps that do not specify a one and if, say, a step in a CWL sets ToolTimeLimit.timelimit: 0, the resulting slurm submission will not contain a wall time argument even if --defaultWalltime is set. --maxJobDuration allows a user to put an upper bound to the runtime of such jobs. Happy to hear your thoughts.

I think it makes sense to keep --defaultWalltime and still respect --maxJobDuration if set, yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants