-
Notifications
You must be signed in to change notification settings - Fork 508
Add MAGmax #7869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add MAGmax #7869
Changes from 1 commit
2531aba
a5c1043
264023a
b23a452
e8d8cc8
2a7ac90
77b1ce5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| name: magmax | ||
| owner: iuc | ||
| description: bin Merging and reAssembly tool | ||
| homepage_url: https://github.com/soedinglab/MAGmax | ||
| long_description: | | ||
| MAGmax is a dereplication tool designed to maximize the recovery of | ||
| Metagenome-Assembled Genomes (MAGs) through bin Merging and reAssembly. | ||
| It performs dereplication in three stages: (i) grouping bins based on | ||
| average sequence identity, (ii) merging bins within each group, | ||
| and (iii) reassembling the merged bins. | ||
| remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/main/tools/magmax | ||
| type: unrestricted | ||
| categories: | ||
| - Metagenomics |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,234 @@ | ||
| <tool id="magmax" name="MAGmax" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@"> | ||
| <description>bin Merging and reAssembly</description> | ||
| <macros> | ||
| <token name="@TOOL_VERSION@">1.3.0</token> | ||
| <token name="@VERSION_SUFFIX@">0</token> | ||
| <token name="@PROFILE@">25.0</token> | ||
| </macros> | ||
| <requirements> | ||
| <requirement type="package" version="@TOOL_VERSION@">magmax</requirement> | ||
| <requirement type="package" version="1.1.0">checkm2</requirement> | ||
| <requirement type="package" version="1.2.9">megahit</requirement> | ||
| </requirements> | ||
| <command detect_errors="exit_code"> | ||
| <![CDATA[ | ||
|
|
||
| export CHECKM2DB='$database.fields.path' && | ||
|
|
||
| mkdir 'bindir' 'readdir' 'mapiddir' 'outputs' && | ||
|
|
||
| #for $bin in $bindir: | ||
| ln -s '$bin' 'bindir/${bin}.fasta' && | ||
| #end for | ||
|
|
||
| #if $readdir: | ||
| #for $read in $readdir: | ||
| ln -s '$read.forward' 'readdir/${read.name}_1.${read.ext}' && | ||
| ln -s '$read.reverse' 'readdir/${read.name}_2.${read.ext}' && | ||
| #end for | ||
| #end if | ||
|
|
||
| #if $mapiddir: | ||
| #for $id in $mapiddir: | ||
| ln -s '$id' 'mapiddir/${id}.txt' && | ||
| #end for | ||
| #end if | ||
|
|
||
| #if $qual: | ||
| ln -s '$qual' 'quality.tsv' && | ||
| #end if | ||
|
|
||
| magmax | ||
| -b 'bindir' | ||
| #if $readdir: | ||
| -r 'readdir' | ||
| #end if | ||
| #if $mapiddir: | ||
| -m 'mapididr' | ||
| #end if | ||
| --ani ${ani} | ||
| --completeness ${completeness} | ||
| --purity ${purity} | ||
| --alignedfrac ${alignedfrac} | ||
| ${no_reassembly} | ||
| ${sensitive} | ||
| ${split} | ||
| #if $qual: | ||
| -q 'quality.tsv' | ||
| #end if | ||
| --assembler ${assembler} | ||
| -t \${GALAXY_SLOTS:-8} | ||
| -o 'outputs' | ||
|
|
||
| ]]> | ||
| </command> | ||
| <inputs> | ||
| <param name="bindir" type="data" format="fasta" multiple="true" label="Input bin file(s)"/> | ||
| <param name="readdir" type="data_collection" optional="true" collection_type="list:paired" format="fastq" multiple="true" label="Input read files" | ||
| help="The identifier of the pair should be name after the [sample_id] otherwise the tool can not work. For more information check the help section at the bottom of the tool! NOTE: when using the {--no-reassembly} or {--sensitive} mode this input is not needed!"/> | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. shoudn't this param only be available if --no-reassembly or --sensitive is not used?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes corret this parameter and |
||
| <param name="mapiddir" type="data" format="txt" optional="true" multiple="true" label="Input MapID file(s)" | ||
| help="For an example how such a file looks like please refer to the help section at the bottom at the tool! NOTE: when using the {--no-reassembly} or {--sensitive} mode this input is not needed"/> | ||
| <param argument="--ani" type="integer" min="0" max="100" value="99" label="Set ANI for clustering (in %)"/> | ||
| <param argument="--completeness" type="integer" min="0" max="100" value="50" label="Set min completeness for bins (in %)"/> | ||
| <param argument="--purity" type="integer" min="0" max="100" value="95" label="Set min purity of bins (in %)"/> | ||
| <param argument="--alignedfrac" type="integer" min="0" value="0" label="Set min aligned fraction of genomes covered in the ANI calculation"/> | ||
| <param argument="--no-reassembly" type="boolean" falsevalue="" truevalue="--no-reassembly" checked="false" label="Perform dereplication without bin merging and reassembly"/> | ||
| <param argument="--sensitive" type="boolean" falsevalue="" truevalue="--sensitive" checked="false" label="Select representatives based on high connectivity" | ||
| help="Bin merging and reassembly steps are disabled"/> | ||
| <param argument="--split" type="boolean" falsevalue="" truevalue="--split" checked="false" label=" Split clusters into sample-wise bins before processing"/> | ||
| <param argument="--qual" type="data" format="tabular" optional="true" label="Input CheckM2 quality report file" | ||
| help="When a file is used here the tool will skip the internal CheckM2 run and will use this file for further steps"/> | ||
| <param argument="--assembler" type="select" label="Select assembler choice for reassembly step" | ||
| help="Spades is recommended"> | ||
| <option value="spades" selected="true">spades</option> | ||
| <option value="mega">MEGAHIT</option> | ||
| </param> | ||
| <param name="database" type="select" label="Select reference genome" help="Checkm2 Diamond database"> | ||
| <options from_data_table="checkm2"> | ||
| <filter type="sort_by" column="2"/> | ||
| </options> | ||
| <validator type="no_options" message="No databases are available for this version of Checkm2. Please contact the Galaxy adminstrators to request one be installed."/> | ||
| </param> | ||
| </inputs> | ||
| <outputs> | ||
| <data name="bins" format="fasta" label="${tool.name} on ${on_string}: Bins"> | ||
| <discover_datasets pattern="(?P<designation>.+).fasta" directory="outputs"/> | ||
| </data> | ||
| <data name="summary" format="tabular" from_work_dir="outputs/bins_checkm2_qualities.tsv" label="${tool.name} on ${on_string}: CheckM2 quality summary"/> | ||
| <data name="member" format="tabular" from_work_dir="outputs/memberships.tsv" label="${tool.name} on ${on_string}: Membership table"/> | ||
| </outputs> | ||
| <tests> | ||
| <test expect_num_outputs="3"> | ||
| <param name="bindir" ftype="fasta" value="sample_ERR3404870_metabat2_results.574.fasta,sample_ERR3405181_metabat2_results.1030.fasta"/> | ||
| <param name="sensitive" value="true"/> | ||
| <param name="no_reassembly" value="true"/> | ||
| <param name="qual" ftype="tabular" value="quality_report.tsv"/> | ||
| <param name="database" value="001"/> | ||
| </test> | ||
| <test expect_num_outputs="3"> | ||
| <param name="bindir" ftype="fasta" value="sample_ERR3404870_metabat2_results.574.fasta,sample_ERR3405181_metabat2_results.1030.fasta"/> | ||
| <param name="readdir"> | ||
| <collection type="list:paired"> | ||
| <element name="ERR3405181"> | ||
| <collection type="paired"> | ||
| <element name="forward" value="ERR3405181_1.fastq" ftype="fastq"/> | ||
| <element name="reverse" value="ERR3405181_2.fastq" ftype="fastq"/> | ||
| </collection> | ||
| </element> | ||
| <element name="ERR3404870"> | ||
| <collection type="paired"> | ||
| <element name="forward" value="ERR3404870_1.fastq" ftype="fastq"/> | ||
| <element name="reverse" value="ERR3404870_2.fastq" ftype="fastq"/> | ||
| </collection> | ||
| </element> | ||
| </collection> | ||
| </param> | ||
| <param name="mapiddir" ftype="tabular" value="ERR3404870_mapids.txt,ERR3405181_mapids.txt"/> | ||
| <param name="qual" ftype="tabular" value="quality_report.tsv"/> | ||
| <param name="database" value="001"/> | ||
| </test> | ||
| <test expect_num_outputs="3"> | ||
| <param name="bindir" ftype="fasta" value="sample_ERR3404870_metabat2_results.574.fasta,sample_ERR3405181_metabat2_results.1030.fasta"/> | ||
| <param name="readdir"> | ||
| <collection type="list:paired"> | ||
| <element name="ERR3405181"> | ||
| <collection type="paired"> | ||
| <element name="forward" value="ERR3405181_1.fastq" ftype="fastq"/> | ||
| <element name="reverse" value="ERR3405181_2.fastq" ftype="fastq"/> | ||
| </collection> | ||
| </element> | ||
| <element name="ERR3404870"> | ||
| <collection type="paired"> | ||
| <element name="forward" value="ERR3404870_1.fastq" ftype="fastq"/> | ||
| <element name="reverse" value="ERR3404870_2.fastq" ftype="fastq"/> | ||
| </collection> | ||
| </element> | ||
| </collection> | ||
| </param> | ||
| <param name="mapiddir" ftype="tabular" value="ERR3404870_mapids.txt,ERR3405181_mapids.txt"/> | ||
| <param name="qual" ftype="tabular" value="quality_report.tsv"/> | ||
| <param name="database" value="001"/> | ||
| <param name="split" value="true"/> | ||
| <param name="ani" value="1"/> | ||
| <param name="completeness" value="1"/> | ||
| <param name="purity" value="1"/> | ||
| <param name="alignedfrac" value="1"/> | ||
| </test> | ||
| <test expect_num_outputs="3"> | ||
| <param name="bindir" ftype="fasta" value="sample_ERR3404870_metabat2_results.574.fasta,sample_ERR3405181_metabat2_results.1030.fasta"/> | ||
| <param name="readdir"> | ||
| <collection type="list:paired"> | ||
| <element name="ERR3405181"> | ||
| <collection type="paired"> | ||
| <element name="forward" value="ERR3405181_1.fastq" ftype="fastq"/> | ||
| <element name="reverse" value="ERR3405181_2.fastq" ftype="fastq"/> | ||
| </collection> | ||
| </element> | ||
| <element name="ERR3404870"> | ||
| <collection type="paired"> | ||
| <element name="forward" value="ERR3404870_1.fastq" ftype="fastq"/> | ||
| <element name="reverse" value="ERR3404870_2.fastq" ftype="fastq"/> | ||
| </collection> | ||
| </element> | ||
| </collection> | ||
| </param> | ||
| <param name="mapiddir" ftype="tabular" value="ERR3404870_mapids.txt,ERR3405181_mapids.txt"/> | ||
| <param name="database" value="001"/> | ||
| </test> | ||
| <test expect_num_outputs="3"> | ||
| <param name="bindir" ftype="fasta" value="sample_ERR3404870_metabat2_results.574.fasta,sample_ERR3405181_metabat2_results.1030.fasta"/> | ||
| <param name="readdir"> | ||
| <collection type="list:paired"> | ||
| <element name="ERR3405181"> | ||
| <collection type="paired"> | ||
| <element name="forward" value="ERR3405181_1.fastq" ftype="fastq"/> | ||
| <element name="reverse" value="ERR3405181_2.fastq" ftype="fastq"/> | ||
| </collection> | ||
| </element> | ||
| <element name="ERR3404870"> | ||
| <collection type="paired"> | ||
| <element name="forward" value="ERR3404870_1.fastq" ftype="fastq"/> | ||
| <element name="reverse" value="ERR3404870_2.fastq" ftype="fastq"/> | ||
| </collection> | ||
| </element> | ||
| </collection> | ||
| </param> | ||
| <param name="mapiddir" ftype="tabular" value="ERR3404870_mapids.txt,ERR3405181_mapids.txt"/> | ||
| <param name="qual" ftype="tabular" value="quality_report.tsv"/> | ||
| <param name="database" value="001"/> | ||
| <param name="assembler" value="mega"/> | ||
| </test> | ||
| </tests> | ||
| <help> | ||
| <![CDATA[ | ||
|
|
||
| **Input** | ||
|
|
||
| - required: | ||
| - A collection with bins | ||
|
|
||
| - optional: | ||
| - if *sensitive* or *no-reassembly* option is used the collection with paired read is not needed otherwise it is required. Note: The element name of the pair has to be match with the sampleID! | ||
| - if *sensitive* or *no-reassembly* option is used the collection with the mapIDs is not needed otherwise it is required. | ||
| - A CheckM2 quality report file. This can be run before this tool to skip the internal Checkm2 step. | ||
|
|
||
| **MapID file(s)** | ||
|
|
||
| The mapID files is a simple *.txt* file with the followed sematic: *read1_ID <sampleID>Ccontig1_ID* | ||
|
|
||
| - The IDs has to be separated by space and also each file has to contain the sampleID in the naming, only the read pair element name has to contain the sampleID name the read themself dont need them. | ||
|
|
||
| Here an example of the test file (ERR3405181_mapids.txt) how the first line, and all other lines, can look like: *ERR3405181.6238983 ERR3404870Ck141_1453* | ||
|
|
||
| **Outputs** | ||
|
|
||
| - Fasta file with the final bin | ||
| - Table summarizing the quality metrics of the dereplicated bins | ||
| - Text file containing list of representative bins and their member bins | ||
|
|
||
| ]]> | ||
| </help> | ||
| <citations> | ||
| <citation type="doi">10.1093/bioinformatics/btaf538</citation> | ||
| </citations> | ||
| </tool> | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is fastqsanger.gz supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didnt found any in the readme but i will ask and upate it if this is supported!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i did ask about it and
*.gzfiles are not yet supported. It can be that in a month it will be supported but this depends if they release it with the next version. But it is planed to be added!