Skip to content

avoid bam2msa to create BAM index in inputdir#3986

Merged
bgruening merged 3 commits intogalaxyproject:masterfrom
pavanvidem:bam2msa-input-fix
Sep 25, 2021
Merged

avoid bam2msa to create BAM index in inputdir#3986
bgruening merged 3 commits intogalaxyproject:masterfrom
pavanvidem:bam2msa-input-fix

Conversation

@pavanvidem
Copy link
Copy Markdown
Member

FOR CONTRIBUTOR:

  • - I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • - License permits unrestricted use (educational + commercial)
  • - This PR adds a new tool or tool collection
  • - This PR updates an existing tool or tool collection
  • - This PR does something else (explain below)

@bernt-matthias
Copy link
Copy Markdown
Contributor

Hey @pavanvidem .. how do you identify tools writing to the input dir?

@pavanvidem
Copy link
Copy Markdown
Member Author

First, we extracted the dataset names from input dir that do not end with .dat using something like find . -not -name "*.dat" -type f | grep -v 'dataset_.*_files'. Then we used gxadmin query q to query the job table using each dataset name and extracted the tools out of it. Finally, we manually check/run each tool and fix it.

@bernt-matthias
Copy link
Copy Markdown
Contributor

@pavanvidem I started some discussion here: galaxyproject/planemo#1189

@bernt-matthias
Copy link
Copy Markdown
Contributor

Still the manual approach will be needed (since I guess we won't have perfect test coverage)...

Comment thread tools/bioext/bam2msa.xml
<expand macro="requirements"/>
<command detect_errors="exit_code"><![CDATA[
## avoid bam2msa to create .bai in inputdir
ln -s '$input' input_bam &&
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Galaxy already stores the index files (only for bam) and you can access it with:$input.metadata.bam_index. So ln -s '$input.metadata.bam_index' input_bam.bai may avoid that bam2msa needs to recreate it?

But I guess then we need a version bump.

@bernt-matthias
Copy link
Copy Markdown
Contributor

First, we extracted the dataset names from input dir that do not end with .dat using something like find . -not -name "*.dat" -type f | grep -v 'dataset_.*_files'. Then we used gxadmin query q to query the job table using each dataset name and extracted the tools out of it. Finally, we manually check/run each tool and fix it.

Hey @pavanvidem just thought a bit more about this. The problem is that old versions of the tool will still write to Galaxy's file dir. How about configuring your galaxy to run jobs as a separate user that does not have write permissions for Galaxy's file dir:

https://github.com/galaxyproject/galaxy/blob/40ddc72f485ae233f3a4aed63847ccb041003320/lib/galaxy/config/sample/galaxy.yml.sample#L2121

Still the problem needs to be fixed: I' currently running IUC's weekly CI with an extended version of planemo (galaxyproject/planemo#1190) : https://github.com/bernt-matthias/tools-iuc/actions/runs/1277704614

@pavanvidem
Copy link
Copy Markdown
Member Author

Hey @pavanvidem just thought a bit more about this. The problem is that old versions of the tool will still write to Galaxy's file dir. How about configuring your galaxy to run jobs as a separate user that does not have write permissions for Galaxy's file dir:

This is a good idea. I should have mentioned that I queried the EU Galaxy database, not my local instance. This might have covered most of the old tool versions and the tools outside IUC.

@natefoo
Copy link
Copy Markdown
Member

natefoo commented Oct 29, 2021

Your excellent work was foiled by the tool again @pavanvidem. It looks like this one generates the index regardless of whether it exists. If it's a symlink to a read-only file, then that fails.

I can generate a PR upstream, but for current versions of the tool I think we will have to remove the bai symlink and let the tool generate the bai itself, unfortunately.

@natefoo
Copy link
Copy Markdown
Member

natefoo commented Oct 29, 2021

Upstream PR: veg/BioExt#45

@bernt-matthias
Copy link
Copy Markdown
Contributor

@pavanvidem can you check if a bump of the tool version to 0.20.4 fixes this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants