Skip to content

Add User-Defined Tools#19434

Merged
jmchilton merged 88 commits intogalaxyproject:devfrom
mvdbeek:user_defined_tools
May 12, 2025
Merged

Add User-Defined Tools#19434
jmchilton merged 88 commits intogalaxyproject:devfrom
mvdbeek:user_defined_tools

Conversation

@mvdbeek
Copy link
Copy Markdown
Member

@mvdbeek mvdbeek commented Jan 21, 2025

User-Defined Tools (Beta)

Starting with Galaxy 25.0, users can create their own tools without requiring administrator privileges to install them. These tools are written in YAML, defined through the Galaxy user interface, and stored in the database.

Differences from Standard Galaxy Tools

Standard Galaxy tools are written in XML and have broad access to the Galaxy database and filesystem during the command and configuration file templating phase, which uses the Cheetah templating language.

For example, the following XML tool command section queries the Galaxy database and writes a file to the home directory of the system user running the Galaxy process:

<command><![CDATA[
    #from pathlib import Path
    #user_id = $__app__.model.session().query($__app__.model.User.id).one()
    #open(f"{Path.home()}/a_file", "w").write("Hello!")
]]></command>

This level of access is acceptable when only administrators install tools. However, allowing regular users to define and execute arbitrary tools requires stricter controls.

To address this, Galaxy now supports a restricted tool language for user-defined tools. This format is modeled after the XML tool definition but replaces Cheetah templating with sandboxed JavaScript expressions that do not have access to the database or filesystem.

Example: Concatenate Files Tool (YAML)

class: GalaxyUserTool
id: cat_user_defined
version: "0.1"
name: Concatenate Files
description: tail-to-head
container: busybox
shell_command: |
  cat $(inputs.datasets.map((input) => input.path).join(' ')) > output.txt
inputs:
  - name: datasets
    multiple: true
    type: data
outputs:
  - name: output1
    type: data
    format_source: datasets
    from_work_dir: output.txt

Equivalent Tool in XML:

<tool id="cat" version="0.1">
    <description>tail-to-head</description>
    <requirements>
        <requirement type="container">busybox</requirement>
    </requirements>
    <command><![CDATA[
cat
#for dataset in datasets:
    '$dataset'
#end for
> '$output1'
    ]]></command>
    <inputs>
        <input name="datasets" format="data" type="data" multiple="true"/>
    </inputs>
    <outputs>
        <output name="output1" format_source="datasets" />
    </outputs>
</tool>

While the structure is similar, several key differences exist:

  • The YAML version includes a required class: GalaxyUserTool line to signal the use of the restricted UserToolSource schema.
  • All user-defined tools must be executed inside a container, specified using the container key.
  • The command to be executed is defined under the shell_command key, using a string with embedded JavaScript expressions inside $(). In the example above, the expression iterates over the input dataset paths and joins them into a single command string.

Enabling User-Defined Tools

To enable this feature:

  1. Set enable_beta_tool_formats: true in your Galaxy configuration.
  2. Create a role of type Custom Tool Execution in the admin user interface.
  3. Assign users or groups to this role.

Sharing User-Defined Tools

User-defined tools are private to their creators. However, if a tool is embedded in a workflow, any user who imports that workflow will automatically have the tool created in their account.

These tools can also be exported to disk and loaded like regular tools, enabling instance-wide availability if needed.

Security considerations

User-defined tools share the same security risks as interactive tools..
See https://training.galaxyproject.org/training-material/topics/admin/tutorials/interactive-tools/tutorial.html#securing-interactive-tools for an extended discussion.
While the feature is in beta we recommend that only trusted users are allowed to use this feature.

Limitations

The user-defined tool language is still evolving, and additional safety audits are ongoing.

Current limitations include:

  • configfiles are not supported
  • Access to reference data is not supported
  • Access to metadata and metadata files (such as BAM indexes) is not supported
  • Access to the extra_files directory is not supported

TODO

  • More tests, especially selenium
  • Parse into separate tool_type so that User Defined Tools can be addressed in job_conf.yml / tpv more easily
  • Usage docs
  • Publish UserToolSource schema

Here's a screenshot of the embedded tool editor.

Screenshot 2025-01-21 at 15 59 57

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

Comment thread lib/galaxy/tool_util/models.py
# and is scoped to to individual user and never adds to global toolbox
dynamic_tools_manager: DynamicToolManager = depends(DynamicToolManager)

@router.get("/api/unprivileged_tools")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I would prefer a name like 'user_tools' but if it is a big change - feel free to ignore. Also sorry I'm reviewing by commit so if any of these changes are undone in future commits feel free to ignore.

Also like the last comment - feel free to just dump the dynamic tools API endpoint and replace it with this. I think I like that name better also.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely open to changing how we name things, we did already have 3 or 4 different versions when we hacked on this in Berlin. I'll keep that for a last pass when it's nearing completion though.

@jmchilton
Copy link
Copy Markdown
Member

jmchilton commented Jan 23, 2025

This is amazing. I don't hate any of it. The CWL fields I think we would struggle to fill out with our runtime is the only part that caused significant stress but I didn't see an attempt to fill those out. I would have semantic questions like is name going to be the name or collection identifier, etc.. but I don't think those are details you've tackled yet unless I missed the commit.

I would have started with locking tools down at the XML layer and have dozens of test cases around making sure tool action expressions cannot be evaluated, etc.. for unprivileged tools but I understand that part is pretty unsexy and I think there is some chance that having a fully defined model means those things might be completely unreachable and so that might have been unnecessary work. I think we need to at least audit all the features before the final merge.

I created a list of things I'd like to see to broken out into smaller PRs to clean up the core as I was reviewing the commits. None of this is essential - if it works, it works - but any of that extra effort would be appreciated and would ease follow up reviews I think and help isolate potential problems.

  • Other John will want the migration pulled into its own commit, I’d like it if you used abstractions from migration util instead of alembic directly. 8e318b1
  • Fix YAML and tool output default handling in f7c0014 (Rebased with d1e7bb8)
  • Mapping type fixes in model/init.py in c2b72e0
  • Component refactor in 9f8ed6d
  • Typing for trans in _workflow_to_dict_run in b3a98d7
  • get to get_one in tools/init in b3a98d7
  • Fix 1d0977c I think?
  • 782caae (container parsing from dicts)
  • de27de2 (collection type in parameter models)
  • f756bcb (better discriminators in parameter models)
  • 95b2eb4 (fix conditionals in YAML tools)
  • ec05106 (Fix job parameter summary for inputs without label)
  • Modeling existing requirements in parameters/interface in 6a98fe8 (would be hard to unwind with new requirement though)

Copy link
Copy Markdown
Member

@nsoranzo nsoranzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions from my current work on the cwl-1.0 branch.

Comment thread lib/galaxy/managers/tools.py Outdated
Comment thread lib/galaxy/managers/tools.py Outdated
Comment thread lib/galaxy/webapps/galaxy/api/dynamic_tools.py
Comment thread lib/galaxy/workflow/modules.py
Comment thread lib/galaxy_test/api/test_tools.py Outdated
@jmchilton
Copy link
Copy Markdown
Member

I think we need to at least audit all the features before the final merge.

Maybe the way to address this is worrying about documentation. A tool translation guide maybe where each feature from XML is listed (under contents in https://docs.galaxyproject.org/en/master/dev/schema.html) and how to port it to YAML and if there are any security considerations. I did a lot of work in syncing XSD and YAML model docs in #18787 - I think we will want something like that for the broader tools right? We will need to keep model docs and XSD docs synchronized but also have separate customizations for each. It is kind of a hard problem but worth thinking about and maybe capturing security concerns at this point.

@mvdbeek
Copy link
Copy Markdown
Member Author

mvdbeek commented Jan 23, 2025

I would have semantic questions like is name going to be the name or collection identifier, etc.. but I don't think those are details you've tackled yet unless I missed the commit.

We already had to do that for the conditional step work, the implementation is in galaxy.workflows.modules.to_cwl. The keys for the inputs object is the input name, the inner elements are the element identifiers (since there's no name anyway for nested collections). For leaf elements (i.e class: File), basename is defined as value.dataset.created_from_basename or element_identifier or value.name ... which is debatable.

I think we need to at least audit all the features before the final merge.

Definitely, I didn't want to go down that route before getting a bit of feedback that we're moving in the right direction, and convincing myself that authoring yaml tools with the schema and shell_command autocompletion is a nice experience. Yes, the input model helps in avoiding some things, but right now you still have cheetah available e.g in the label section of the outputs for instance, that probably needs to go as well.

I agree that there is a lot of work left, and some work that can be pulled out separately, I'll do that as I keep making progress here.

how to port it to YAML and if there are any security considerations

Since we parse both XML and YAML into ToolSource, and we can build the representation out of the ToolSource there is also an opportunity to convert existing XML tools (minus the cheeath, configfiles etc things). That's a nice way of exploring how it'll feel to build complex tools in the YAML variant.

think we will want something like that for the broader tools right? We will need to keep model docs and XSD docs synchronized

Indeed, that's probably the next thing to do to make the json schema really helpful. I have for instance sprinkled a few annotation into the model, and I just copied the relevant part from the XSD. We should keep that in sync somehow.
Screenshot 2025-01-23 at 17 25 54

Comment on lines +1368 to +1413
hidden: Mapped[Optional[bool]] = mapped_column(default=False)
active: Mapped[Optional[bool]] = mapped_column(default=True)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 columns are also present in "DynamicTool", is the duplication because the plan is to make DynamicTools sharable by associating multiple user ids to the same tool id?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, DynamicTool contains the actual tool representation, and might be created by an admin, and those are globally available if the public flag is set. The private, user defined tools also use DynamicTool to store the representation (with public=False), but ownership and access is managed through UserDynamicToolAssociation. The same DynamicTool can be associated to multiple UserDynamicToolAssociations, that's why I think we should manage active and hidden on the association.

Copy link
Copy Markdown
Member

@nsoranzo nsoranzo Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! I find the 5 properties a bit confusing and I fear the overlap may lead to future bugs. Do you think we can streamline them and clearly define what each is used for?
E.g.:

  • DynamicTool.public: Admin-only. If set to True, it makes a user-defined tool visible and executable to all users
  • DynamicTool.active: Admin-only. If set to False, it makes a user-defined tool not executable by any user (what about visible?)
  • UserDynamicToolAssociation.hidden: User-only: If set to True, makes a user-defined tool not visible (but it can still be executed by the user)

Not sure we need the others? DynamicTool.hidden seems to be covered by public, and UserDynamicToolAssociation.active seems unnecessary as a user should be able to just hide or delete the u.-d. tool they don't want to use any more.
I may well be off track with the above definitions, but hopefully you get my point.

Copy link
Copy Markdown
Member Author

@mvdbeek mvdbeek Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add comments and we can add a hidden flag for UserDynamicToolAssociation if we need it down the road. I am wary of modifying DynamicTool.hidden, this might still be a valid path ? (i.e runnable but hidden)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, comments would be great, than we can discuss more if needed!

@mvdbeek mvdbeek force-pushed the user_defined_tools branch from ec05106 to a0c1e15 Compare January 31, 2025 17:48
Comment thread lib/galaxy/managers/tools.py Outdated
@mvdbeek mvdbeek mentioned this pull request Feb 26, 2025
4 tasks
@mvdbeek mvdbeek force-pushed the user_defined_tools branch 3 times, most recently from f8e134d to 01d05c5 Compare March 27, 2025 11:59
Comment thread lib/galaxy/tools/expressions/evaluation.py
@mvdbeek mvdbeek force-pushed the user_defined_tools branch 6 times, most recently from bad3bf7 to e8979e9 Compare March 31, 2025 14:30
@mvdbeek mvdbeek force-pushed the user_defined_tools branch from 942aca1 to f311f7d Compare May 12, 2025 08:55
@jmchilton jmchilton merged commit 64a3b40 into galaxyproject:dev May 12, 2025
56 of 58 checks passed
@jmchilton
Copy link
Copy Markdown
Member

As impressive as it is exciting - amazing work @mvdbeek!

@mvdbeek mvdbeek mentioned this pull request May 12, 2025
16 tasks
@mvdbeek
Copy link
Copy Markdown
Member Author

mvdbeek commented May 12, 2025

A lot of thanks also goes to @mr-c for the initial hackathon, pushing the idea and providing so many lessons from the CWL side!

@martenson
Copy link
Copy Markdown
Member

yooohooo 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

6 participants