Skip to content

Add workflow Metadata and Sequences from BioProjectIDs#1177

Open
gdefazio wants to merge 19 commits intogalaxyproject:mainfrom
gdefazio:Metadata-and-Sequences-from-BioProjectIDs
Open

Add workflow Metadata and Sequences from BioProjectIDs#1177
gdefazio wants to merge 19 commits intogalaxyproject:mainfrom
gdefazio:Metadata-and-Sequences-from-BioProjectIDs

Conversation

@gdefazio
Copy link
Copy Markdown

FOR CONTRIBUTOR:

  • I have read the Adding workflows guidelines
  • License permits unrestricted use (educational + commercial)
  • Please also take note of the reviewer guidelines below to facilitate a smooth review process.

FOR REVIEWERS:

  • .dockstore.yml: file is present and aligned with creator metadata in workflow. ORCID identifiers are strongly encouraged in creator metadata. The .dockstore.yml file is required to run tests
  • Workflow is sufficiently generic to be used with lab data and does not hardcode sample names, reference data and can be run without reading an accompanying tutorial.
  • In workflow: annotation field contains short description of what the workflow does. Should start with This workflow does/runs/performs … xyz … to generate/analyze/etc …
  • In workflow: workflow inputs and outputs have human readable names (spaces are fine, no underscore, dash only where spelling dictates it), no abbreviation unless it is generally understood. Altering input or output labels requires adjusting these labels in the the workflow-tests.yml file as well
  • In workflow: name field should be human readable (spaces are fine, no underscore, dash only where spelling dictates it), no abbreviation unless generally understood
  • Workflow folder: prefer dash (-) over underscore (_), prefer all lowercase. Folder becomes repository in iwc-workflows organization and is included in TRS id
  • Readme explains what workflow does, what are valid inputs and what outputs users can expect. If a tutorial or other resources exist they can be linked. If a similar workflow exists in IWC readme should explain differences with existing workflow and when one might prefer one workflow over another
  • Changelog contains appropriate entries
  • Large files (> 100 KB) are uploaded to zenodo and location urls are used in test file

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions
Copy link
Copy Markdown

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 2
Passed 1
Error 0
Failure 1
Skipped 0
Failed Tests
  • ❌ metadata-and-sequences-from-BioProjectIDs.ga_1

    Problems:

    • Output with path /tmp/tmptjnsdwb0/SRR37273408reverse__6f94210b-dd5f-4e87-a3da-fa5ed472355d.fastqsanger.gz different than expected, difference (using contains):
      ( /home/runner/work/iwc/iwc/workflows/data-fetching/metadata-and-sequences-from-BioProjectIDs/test-data/test2_SRR37273408_forward.fastq v. /tmp/tmpbnje941rtest2_SRR37273408_forward.fastq )
      Failed to find '@SRR37273408.1 M02133:60:000000000-BC45T:1:1101:15828:1331/1' in history data. (lines_diff=0).
      

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: BioProject IDs:

        • step_state: scheduled
      • Step 2: --assay (metadata download):

        • step_state: scheduled
      • Step 3: --desc (metadata download):

        • step_state: scheduled
      • Step 4: --detailed (metadata download):

        • step_state: scheduled
      • Step 5: --expand (metadata download):

        • step_state: scheduled
      • Step 6: Group by Experiments (fastq download):

        • step_state: scheduled
      • Step 7: Group by Sample (fastq download):

        • step_state: scheduled
      • Step 8: Separe BioProject IDs (toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/python:3.5--2

            Command Line:

            • mkdir ./out && python '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/2dae863c8f42/split_file_to_collection/split_file_to_collection.py' --out ./out --in '/tmp/tmpkm4a4wrk/files/d/9/f/dataset_d9fef476-ab20-46d4-bae1-f379df12a8ec.dat' --ftype 'txt' --chunksize 1 --file_names 'split_file' --file_ext 'txt'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "txt"
              __workflow_invocation_uuid__ "08e3b86c277011f1a1ae000d3a3ac48c"
              chromInfo "/tmp/tmpkm4a4wrk/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              split_parms {"__current_case__": 5, "input": {"values": [{"id": 9, "src": "hda"}]}, "newfilenames": "split_file", "select_allocate": {"__current_case__": 2, "allocate": "byrow"}, "select_ftype": "txt", "select_mode": {"__current_case__": 0, "chunksize": "1", "mode": "chunk"}}
      • Step 9: Add BioProject IDs as parameters (param_value_from_file):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • cd ../; python _evaluate_expression_.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "08e3b86c277011f1a1ae000d3a3ac48c"
              chromInfo "/tmp/tmpkm4a4wrk/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              param_type "text"
              remove_newlines true
          • Job 2:

            • Job state is ok

            Command Line:

            • cd ../; python _evaluate_expression_.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "08e3b86c277011f1a1ae000d3a3ac48c"
              chromInfo "/tmp/tmpkm4a4wrk/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              param_type "text"
              remove_newlines true
      • Step 10: Metadata From BioProject IDs (toolshed.g2.bx.psu.edu/repos/iuc/pysradb_search/pysradb_search/2.5.1+galaxy0):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-e62c45964731bf241efeedb78776ebc093302f62:3c386467fc54c7b7a8da30b0705408fd927d49c0-0

            Command Line:

            • pysradb metadata 'PRJNA1417618' --saveto metadata_output.tsv   --detailed   && pysradb --version

            Exit Code:

            • 0

            Standard Output:

            • pysradb 2.5.1
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "08e3b86c277011f1a1ae000d3a3ac48c"
              chromInfo "/tmp/tmpkm4a4wrk/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              conditional_subcommand {"__current_case__": 1, "assay": false, "desc": false, "detailed": true, "expand": false, "prj_id": "PRJNA1417618", "selector": "metadata"}
              dbkey "?"
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-e62c45964731bf241efeedb78776ebc093302f62:3c386467fc54c7b7a8da30b0705408fd927d49c0-0

            Command Line:

            • pysradb metadata 'PRJNA1425250' --saveto metadata_output.tsv   --detailed   && pysradb --version

            Exit Code:

            • 0

            Standard Output:

            • pysradb 2.5.1
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "08e3b86c277011f1a1ae000d3a3ac48c"
              chromInfo "/tmp/tmpkm4a4wrk/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              conditional_subcommand {"__current_case__": 1, "assay": false, "desc": false, "detailed": true, "expand": false, "prj_id": "PRJNA1425250", "selector": "metadata"}
              dbkey "?"
      • Step 11: Run IDs extract (toolshed.g2.bx.psu.edu/repos/iuc/table_compute/table_compute/1.2.4+galaxy2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-344874846f44224e5f0b7b741eacdddffe895d1e:d3fff24ee1297b4c3bcef48354c2a30f0c82007a-2

            Command Line:

            • cp '/tmp/tmpkm4a4wrk/job_working_directory/000/13/configs/tmpf6vi7cpx' ./userconfig.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/safety.py' ./safety.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/table_compute.py' ./table_compute.py && python ./table_compute.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "tsv"
              __workflow_invocation_uuid__ "08e3b86c277011f1a1ae000d3a3ac48c"
              chromInfo "/tmp/tmpkm4a4wrk/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              out_opts None
              precision "6"
              singtabop {"__current_case__": 0, "adv": {"header": null, "nrows": null, "skip_blank_lines": true, "skipfooter": null}, "col_row_names": ["has_col_names"], "input": {"values": [{"id": 16, "src": "dce"}]}, "use_type": "single", "user": {"__current_case__": 1, "mode": "select", "select_cols_wanted": "1", "select_keepdupe": null, "select_rows_wanted": null}}
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-344874846f44224e5f0b7b741eacdddffe895d1e:d3fff24ee1297b4c3bcef48354c2a30f0c82007a-2

            Command Line:

            • cp '/tmp/tmpkm4a4wrk/job_working_directory/000/14/configs/tmpvjzvyo_g' ./userconfig.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/safety.py' ./safety.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/table_compute.py' ./table_compute.py && python ./table_compute.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "tsv"
              __workflow_invocation_uuid__ "08e3b86c277011f1a1ae000d3a3ac48c"
              chromInfo "/tmp/tmpkm4a4wrk/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              out_opts None
              precision "6"
              singtabop {"__current_case__": 0, "adv": {"header": null, "nrows": null, "skip_blank_lines": true, "skipfooter": null}, "col_row_names": ["has_col_names"], "input": {"values": [{"id": 17, "src": "dce"}]}, "use_type": "single", "user": {"__current_case__": 1, "mode": "select", "select_cols_wanted": "1", "select_keepdupe": null, "select_rows_wanted": null}}
      • Step 12: Sequences Download (toolshed.g2.bx.psu.edu/repos/iuc/fastq_dl/fastq_dl/3.0.1+galaxy2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/fastq-dl:3.0.1--pyhdfd78af_0

            Command Line:

            • mkdir -p single-end paired-end logs && mapfile -t accessionsarr < "/tmp/tmpkm4a4wrk/files/5/b/6/dataset_5b6499bb-836a-48c6-98c5-d8096a3a0c4c.dat" &&  for accessionid in "${accessionsarr[@]}"; do fastq-dl --accession "$accessionid" --provider ena --only-provider   ; exit_code=$? ; if [ $exit_code -ne 0 ]; then echo "fastq-dl failed for accession: ${accessionid}" >&2 ; exit $exit_code ; break ; else mv fastq-run-info.tsv logs/"$accessionid"-fastq-run-info.tsv > /dev/null 2>&1 || true; fi ; done  && find . -maxdepth 1 -name "*_1.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_1/_forward/")"' {} \; && find . -maxdepth 1 -name "*_2.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_2/_reverse/")"' {} \; && find . -maxdepth 1 -name "*_R1.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_R1/_forward/")"' {} \; && find . -maxdepth 1 -name "*_R2.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_R2/_reverse/")"' {} \; && mv *.gz single-end > /dev/null 2>&1 || true

            Exit Code:

            • 0

            Standard Error:

            • 2026-03-24 10:57:06 INFO     2026-03-24 10:57:06:root:INFO -     download.py:189
                                           Query: SRR37273407                                 
                                  INFO     2026-03-24 10:57:06:root:INFO -     download.py:190
                                           Archive: ena                                       
                                  INFO     2026-03-24 10:57:06:root:INFO -     download.py:195
                                           Total Runs To Download: 1                          
                                  INFO     2026-03-24 10:57:06:root:INFO -     download.py:214
                                           Working on run SRR37273407...                      
                                  INFO     2026-03-24 10:57:06:root:INFO -          ena.py:167
                                           /tmp/tmpkm4a4wrk/job_working_directory/0           
                                           00/15/working/SRR37273407_1.fastq.gz FTP           
                                           download attempt 1                                 
              2026-03-24 10:57:21 INFO     2026-03-24 10:57:21:root:INFO -          ena.py:195
                                           Successfully downloaded                            
                                           /tmp/tmpkm4a4wrk/job_working_directory/0           
                                           00/15/working/SRR37273407_1.fastq.gz               
                                  INFO     2026-03-24 10:57:21:root:INFO -     download.py:311
                                           Writing metadata to                                
                                           /tmp/tmpkm4a4wrk/job_working_direct                
                                           ory/000/15/working/fastq-run-info.t                
                                           sv                                                 
              2026-03-24 10:57:24 INFO     2026-03-24 10:57:24:root:INFO -     download.py:189
                                           Query: SRR37273408                                 
                                  INFO     2026-03-24 10:57:24:root:INFO -     download.py:190
                                           Archive: ena                                       
                                  INFO     2026-03-24 10:57:24:root:INFO -     download.py:195
                                           Total Runs To Download: 1                          
                                  INFO     2026-03-24 10:57:24:root:INFO -     download.py:214
                                           Working on run SRR37273408...                      
                                  INFO     2026-03-24 10:57:24:root:INFO -          ena.py:167
                                           /tmp/tmpkm4a4wrk/job_working_directory/0           
                                           00/15/working/SRR37273408_1.fastq.gz FTP           
                                           download attempt 1                                 
              2026-03-24 10:58:32 INFO     2026-03-24 10:58:32:root:INFO -          ena.py:195
                                           Successfully downloaded                            
                                           /tmp/tmpkm4a4wrk/job_working_directory/0           
                                           00/15/working/SRR37273408_1.fastq.gz               
                                  INFO     2026-03-24 10:58:32:root:INFO -          ena.py:167
                                           /tmp/tmpkm4a4wrk/job_working_directory/0           
                                           00/15/working/SRR37273408_2.fastq.gz FTP           
                                           download attempt 1                                 
              2026-03-24 10:59:04 INFO     2026-03-24 10:59:04:root:INFO -          ena.py:195
                                           Successfully downloaded                            
                                           /tmp/tmpkm4a4wrk/job_working_directory/0           
                                           00/15/working/SRR37273408_2.fastq.gz               
                                  INFO     2026-03-24 10:59:04:root:INFO -     download.py:311
                                           Writing metadata to                                
                                           /tmp/tmpkm4a4wrk/job_working_direct                
                                           ory/000/15/working/fastq-run-info.t                
                                           sv                                                 
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "08e3b86c277011f1a1ae000d3a3ac48c"
              chromInfo "/tmp/tmpkm4a4wrk/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              group_by_experiment false
              group_by_sample false
              input_type {"__current_case__": 1, "accessions_file": {"values": [{"id": 18, "src": "dce"}]}, "select_input_type": "accessions_list"}
              only_download_metadata false
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/fastq-dl:3.0.1--pyhdfd78af_0

            Command Line:

            • mkdir -p single-end paired-end logs && mapfile -t accessionsarr < "/tmp/tmpkm4a4wrk/files/a/0/7/dataset_a0732496-dd24-49dc-a974-786aadbd7379.dat" &&  for accessionid in "${accessionsarr[@]}"; do fastq-dl --accession "$accessionid" --provider ena --only-provider   ; exit_code=$? ; if [ $exit_code -ne 0 ]; then echo "fastq-dl failed for accession: ${accessionid}" >&2 ; exit $exit_code ; break ; else mv fastq-run-info.tsv logs/"$accessionid"-fastq-run-info.tsv > /dev/null 2>&1 || true; fi ; done  && find . -maxdepth 1 -name "*_1.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_1/_forward/")"' {} \; && find . -maxdepth 1 -name "*_2.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_2/_reverse/")"' {} \; && find . -maxdepth 1 -name "*_R1.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_R1/_forward/")"' {} \; && find . -maxdepth 1 -name "*_R2.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_R2/_reverse/")"' {} \; && mv *.gz single-end > /dev/null 2>&1 || true

            Exit Code:

            • 0

            Standard Error:

            • 2026-03-24 10:59:17 INFO     2026-03-24 10:59:17:root:INFO -     download.py:189
                                           Query: SRR37073390                                 
                                  INFO     2026-03-24 10:59:17:root:INFO -     download.py:190
                                           Archive: ena                                       
                                  INFO     2026-03-24 10:59:17:root:INFO -     download.py:195
                                           Total Runs To Download: 1                          
                                  INFO     2026-03-24 10:59:17:root:INFO -     download.py:214
                                           Working on run SRR37073390...                      
                                  INFO     2026-03-24 10:59:17:root:INFO -          ena.py:167
                                           /tmp/tmpkm4a4wrk/job_working_directory/0           
                                           00/16/working/SRR37073390_1.fastq.gz FTP           
                                           download attempt 1                                 
              2026-03-24 11:01:29 INFO     2026-03-24 11:01:29:root:INFO -          ena.py:195
                                           Successfully downloaded                            
                                           /tmp/tmpkm4a4wrk/job_working_directory/0           
                                           00/16/working/SRR37073390_1.fastq.gz               
                                  INFO     2026-03-24 11:01:29:root:INFO -          ena.py:167
                                           /tmp/tmpkm4a4wrk/job_working_directory/0           
                                           00/16/working/SRR37073390_2.fastq.gz FTP           
                                           download attempt 1                                 
              2026-03-24 11:02:22 INFO     2026-03-24 11:02:22:root:INFO -          ena.py:195
                                           Successfully downloaded                            
                                           /tmp/tmpkm4a4wrk/job_working_directory/0           
                                           00/16/working/SRR37073390_2.fastq.gz               
                                  INFO     2026-03-24 11:02:22:root:INFO -     download.py:311
                                           Writing metadata to                                
                                           /tmp/tmpkm4a4wrk/job_working_direct                
                                           ory/000/16/working/fastq-run-info.t                
                                           sv                                                 
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "08e3b86c277011f1a1ae000d3a3ac48c"
              chromInfo "/tmp/tmpkm4a4wrk/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              group_by_experiment false
              group_by_sample false
              input_type {"__current_case__": 1, "accessions_file": {"values": [{"id": 19, "src": "dce"}]}, "select_input_type": "accessions_list"}
              only_download_metadata false
    • Other invocation details
      • history_id

        • 727741a4ae178bc3
      • history_state

        • ok
      • invocation_id

        • 727741a4ae178bc3
      • invocation_state

        • scheduled
      • workflow_id

        • 727741a4ae178bc3
Passed Tests
  • ✅ metadata-and-sequences-from-BioProjectIDs.ga_0

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: BioProject IDs:

        • step_state: scheduled
      • Step 2: --assay (metadata download):

        • step_state: scheduled
      • Step 3: --desc (metadata download):

        • step_state: scheduled
      • Step 4: --detailed (metadata download):

        • step_state: scheduled
      • Step 5: --expand (metadata download):

        • step_state: scheduled
      • Step 6: Group by Experiments (fastq download):

        • step_state: scheduled
      • Step 7: Group by Sample (fastq download):

        • step_state: scheduled
      • Step 8: Separe BioProject IDs (toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/python:3.5--2

            Command Line:

            • mkdir ./out && python '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/2dae863c8f42/split_file_to_collection/split_file_to_collection.py' --out ./out --in '/tmp/tmpkm4a4wrk/files/9/0/8/dataset_908eb407-3c6f-4e71-aaf1-a0ba003b29ad.dat' --ftype 'txt' --chunksize 1 --file_names 'split_file' --file_ext 'txt'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "tabular"
              __workflow_invocation_uuid__ "0de38f14276f11f1a1ae000d3a3ac48c"
              chromInfo "/tmp/tmpkm4a4wrk/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              split_parms {"__current_case__": 5, "input": {"values": [{"id": 1, "src": "hda"}]}, "newfilenames": "split_file", "select_allocate": {"__current_case__": 2, "allocate": "byrow"}, "select_ftype": "txt", "select_mode": {"__current_case__": 0, "chunksize": "1", "mode": "chunk"}}
      • Step 9: Add BioProject IDs as parameters (param_value_from_file):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • cd ../; python _evaluate_expression_.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "0de38f14276f11f1a1ae000d3a3ac48c"
              chromInfo "/tmp/tmpkm4a4wrk/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              param_type "text"
              remove_newlines true
      • Step 10: Metadata From BioProject IDs (toolshed.g2.bx.psu.edu/repos/iuc/pysradb_search/pysradb_search/2.5.1+galaxy0):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-e62c45964731bf241efeedb78776ebc093302f62:3c386467fc54c7b7a8da30b0705408fd927d49c0-0

            Command Line:

            • pysradb metadata 'PRJNA1417618' --saveto metadata_output.tsv   --detailed   && pysradb --version

            Exit Code:

            • 0

            Standard Output:

            • pysradb 2.5.1
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "0de38f14276f11f1a1ae000d3a3ac48c"
              chromInfo "/tmp/tmpkm4a4wrk/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              conditional_subcommand {"__current_case__": 1, "assay": false, "desc": false, "detailed": true, "expand": false, "prj_id": "PRJNA1417618", "selector": "metadata"}
              dbkey "?"
      • Step 11: Run IDs extract (toolshed.g2.bx.psu.edu/repos/iuc/table_compute/table_compute/1.2.4+galaxy2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-344874846f44224e5f0b7b741eacdddffe895d1e:d3fff24ee1297b4c3bcef48354c2a30f0c82007a-2

            Command Line:

            • cp '/tmp/tmpkm4a4wrk/job_working_directory/000/5/configs/tmppvgqdg_g' ./userconfig.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/safety.py' ./safety.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/table_compute.py' ./table_compute.py && python ./table_compute.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "tsv"
              __workflow_invocation_uuid__ "0de38f14276f11f1a1ae000d3a3ac48c"
              chromInfo "/tmp/tmpkm4a4wrk/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              out_opts None
              precision "6"
              singtabop {"__current_case__": 0, "adv": {"header": null, "nrows": null, "skip_blank_lines": true, "skipfooter": null}, "col_row_names": ["has_col_names"], "input": {"values": [{"id": 3, "src": "dce"}]}, "use_type": "single", "user": {"__current_case__": 1, "mode": "select", "select_cols_wanted": "1", "select_keepdupe": null, "select_rows_wanted": null}}
      • Step 12: Sequences Download (toolshed.g2.bx.psu.edu/repos/iuc/fastq_dl/fastq_dl/3.0.1+galaxy2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/fastq-dl:3.0.1--pyhdfd78af_0

            Command Line:

            • mkdir -p single-end paired-end logs && mapfile -t accessionsarr < "/tmp/tmpkm4a4wrk/files/0/5/4/dataset_05477c4d-3056-43a0-8a86-f54bb682d718.dat" &&  for accessionid in "${accessionsarr[@]}"; do fastq-dl --accession "$accessionid" --provider ena --only-provider   ; exit_code=$? ; if [ $exit_code -ne 0 ]; then echo "fastq-dl failed for accession: ${accessionid}" >&2 ; exit $exit_code ; break ; else mv fastq-run-info.tsv logs/"$accessionid"-fastq-run-info.tsv > /dev/null 2>&1 || true; fi ; done  && find . -maxdepth 1 -name "*_1.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_1/_forward/")"' {} \; && find . -maxdepth 1 -name "*_2.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_2/_reverse/")"' {} \; && find . -maxdepth 1 -name "*_R1.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_R1/_forward/")"' {} \; && find . -maxdepth 1 -name "*_R2.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_R2/_reverse/")"' {} \; && mv *.gz single-end > /dev/null 2>&1 || true

            Exit Code:

            • 0

            Standard Error:

            • 2026-03-24 10:51:52 INFO     2026-03-24 10:51:52:root:INFO -     download.py:189
                                           Query: SRR37073390                                 
                                  INFO     2026-03-24 10:51:52:root:INFO -     download.py:190
                                           Archive: ena                                       
                                  INFO     2026-03-24 10:51:52:root:INFO -     download.py:195
                                           Total Runs To Download: 1                          
                                  INFO     2026-03-24 10:51:52:root:INFO -     download.py:214
                                           Working on run SRR37073390...                      
                                  INFO     2026-03-24 10:51:52:root:INFO -          ena.py:167
                                           /tmp/tmpkm4a4wrk/job_working_directory/0           
                                           00/6/working/SRR37073390_1.fastq.gz FTP            
                                           download attempt 1                                 
              2026-03-24 10:53:43 INFO     2026-03-24 10:53:43:root:INFO -          ena.py:195
                                           Successfully downloaded                            
                                           /tmp/tmpkm4a4wrk/job_working_directory/0           
                                           00/6/working/SRR37073390_1.fastq.gz                
                                  INFO     2026-03-24 10:53:43:root:INFO -          ena.py:167
                                           /tmp/tmpkm4a4wrk/job_working_directory/0           
                                           00/6/working/SRR37073390_2.fastq.gz FTP            
                                           download attempt 1                                 
              2026-03-24 10:54:51 INFO     2026-03-24 10:54:51:root:INFO -          ena.py:195
                                           Successfully downloaded                            
                                           /tmp/tmpkm4a4wrk/job_working_directory/0           
                                           00/6/working/SRR37073390_2.fastq.gz                
                                  INFO     2026-03-24 10:54:51:root:INFO -     download.py:311
                                           Writing metadata to                                
                                           /tmp/tmpkm4a4wrk/job_working_direct                
                                           ory/000/6/working/fastq-run-info.ts                
                                           v                                                  
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "0de38f14276f11f1a1ae000d3a3ac48c"
              chromInfo "/tmp/tmpkm4a4wrk/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              group_by_experiment false
              group_by_sample false
              input_type {"__current_case__": 1, "accessions_file": {"values": [{"id": 4, "src": "dce"}]}, "select_input_type": "accessions_list"}
              only_download_metadata false
    • Other invocation details
      • history_id

        • 12605b074346335e
      • history_state

        • ok
      • invocation_id

        • 12605b074346335e
      • invocation_state

        • scheduled
      • workflow_id

        • 727741a4ae178bc3

@github-actions
Copy link
Copy Markdown

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 2
Passed 1
Error 0
Failure 1
Skipped 0
Failed Tests
  • ❌ metadata-and-sequences-from-BioProjectIDs.ga_1

    Problems:

    • Output with path /tmp/tmpy64g7xaq/pysradb search metadata on  table__d3d0e3cf-9b64-439d-aad9-84372f9be6c6.tsv different than expected, difference (using diff):
      ( /home/runner/work/iwc/iwc/workflows/data-fetching/metadata-and-sequences-from-BioProjectIDs/test-data/test2_metadata_file_split_file_000000.txt.tsv v. /tmp/tmps3avenf_test2_metadata_file_split_file_000000.txt.tsv )
      --- local_file
      +++ history_data
      @@ -1,3 +1,3 @@
      -run_accession	study_accession	study_title	experiment_accession	experiment_title	experiment_desc	organism_taxid	organism_name	library_name	library_strategy	library_source	library_selection	library_layout	sample_accession	sample_title	biosample	bioproject	instrument	instrument_model	instrument_model_desc	total_spots	total_size	run_total_spots	run_total_bases	run_alias	public_filename	public_size	public_date	public_md5	public_version	public_semantic_name	public_supertype	public_sratoolkit	aws_url	aws_free_egress	aws_access_type	public_url	ncbi_url	ncbi_free_egress	ncbi_access_type	gcp_url	gcp_free_egress	gcp_access_type	experiment_alias	strain	isolation_source	collection_date	geo_loc_name	sample_type	biomaterial_provider	biosamplemodel	ena_fastq_http	ena_fastq_http_1	ena_fastq_http_2	ena_fastq_ftp	ena_fastq_ftp_1	ena_fastq_ftp_2
      -SRR37273407	SRP677941	Complete genome assembly of the first Streptomyces acidocola isolate from Malaysia using Nanopore long-read sequencing and Illumina polishing	SRX32206378	Nanopore DNA-Seq of Streptomyces acidicola	Nanopore DNA-Seq of Streptomyces acidicola	2596892	Streptomyces acidicola	TPS3_Nanopore	WGS	GENOMIC	RANDOM	SINGLE	SRS28131937		SAMN55411282	PRJNA1425250	MinION	MinION	OXFORD_NANOPORE	45419	184487295	45419	203067398	TPS3_nanopore_hac.fastq.gz	SRR37273407	184489529	2026-02-18 05:58:18	24eda2099b77249881e6957ff82b8498	1	SRA Normalized	Primary ETL	1	https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR37273407/SRR37273407	worldwide	anonymous	https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos10/sra-pub-run-1001/SRR037/37273/SRR37273407/SRR37273407.1	https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos10/sra-pub-run-1001/SRR037/37273/SRR37273407/SRR37273407.1	worldwide	anonymous	gs://sra-pub-run-111/SRR37273407/SRR37273407.1	gs.us-east1	gcp identity		TPS3	marine sediment	2013-03	Malaysia: Tioman Island, Pahang	cell culture	Prof Annie Tan, Universiti Malaya	Microbe, viral or environmental		http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR372/007/SRR37273407/SRR37273407_1.fastq.gz			era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR372/007/SRR37273407/SRR37273407_1.fastq.gz	
      -SRR37273408	SRP677941	Complete genome assembly of the first Streptomyces acidocola isolate from Malaysia using Nanopore long-read sequencing and Illumina polishing	SRX32206377	Illumina DNA-Seq of Streptomyces acidicola	Illumina DNA-Seq of Streptomyces acidicola	2596892	Streptomyces acidicola	TPS3_Illumina	WGS	GENOMIC	RANDOM	PAIRED	SRS28131937		SAMN55411282	PRJNA1425250	Illumina MiSeq	Illumina MiSeq	ILLUMINA	2935209	1013212413	2935209	1472053376	TPS3_S1_L001_R1_001.fastq.gz	SRR37273408.lite	376905109	2026-02-18 07:15:02	de5fc51b2d7f6f455ec7659b0f3467fc	1	SRA Lite	Primary ETL	1	s3://sra-pub-zq-5/SRR37273408/SRR37273408.lite.1	s3.us-east-1	aws identity	https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos10/sra-pub-zq-1002/SRR037/37273/SRR37273408/SRR37273408.lite.1	https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos10/sra-pub-zq-1002/SRR037/37273/SRR37273408/SRR37273408.lite.1	worldwide	anonymous	gs://sra-pub-zq-109/SRR37273408/SRR37273408.lite.1	gs.us-east1	gcp identity		TPS3	marine sediment	2013-03	Malaysia: Tioman Island, Pahang	cell culture	Prof Annie Tan, Universiti Malaya	Microbe, viral or environmental		http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR372/008/SRR37273408/SRR37273408_1.fastq.gz	http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR372/008/SRR37273408/SRR37273408_2.fastq.gz		era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR372/008/SRR37273408/SRR37273408_1.fastq.gz	era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR372/008/SRR37273408/SRR37273408_2.fastq.gz
      +run_accession	study_accession	study_title	experiment_accession	experiment_title	experiment_desc	organism_taxid	organism_name	library_name	library_strategy	library_source	library_selection	library_layout	sample_accession	sample_title	biosample	bioproject	instrument	instrument_model	instrument_model_desc	total_spots	total_size	run_total_spots	run_total_bases	run_alias	public_filename	public_url	public_size	public_date	public_md5	public_version	public_semantic_name	public_supertype	public_sratoolkit	aws_url	aws_free_egress	aws_access_type	ncbi_url	ncbi_free_egress	ncbi_access_type	gcp_url	gcp_free_egress	gcp_access_type	experiment_alias	strain	isolation_source	collection_date	geo_loc_name	sample_type	biomaterial_provider	biosamplemodel	ena_fastq_http	ena_fastq_http_1	ena_fastq_http_2	ena_fastq_ftp	ena_fastq_ftp_1	ena_fastq_ftp_2
      +SRR37273407	SRP677941	Complete genome assembly of the first Streptomyces acidocola isolate from Malaysia using Nanopore long-read sequencing and Illumina polishing	SRX32206378	Nanopore DNA-Seq of Streptomyces acidicola	Nanopore DNA-Seq of Streptomyces acidicola	2596892	Streptomyces acidicola	TPS3_Nanopore	WGS	GENOMIC	RANDOM	SINGLE	SRS28131937		SAMN55411282	PRJNA1425250	MinION	MinION	OXFORD_NANOPORE	45419	184487295	45419	203067398	TPS3_nanopore_hac.fastq.gz	TPS3_nanopore_hac.fastq.gz	https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos10/sra-pub-run-1001/SRR037/37273/SRR37273407/SRR37273407.1	210524830	2026-02-18 05:57:53	ff8a0c7837c47dc8a0331639c25ebda9	1	fastq	Original	0	s3://sra-pub-src-18/SRR37273407/TPS3_nanopore_hac.fastq.gz.1	-	Use Cloud Data Delivery	https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos10/sra-pub-run-1001/SRR037/37273/SRR37273407/SRR37273407.1	worldwide	anonymous	gs://sra-pub-run-111/SRR37273407/SRR37273407.1	gs.us-east1	gcp identity		TPS3	marine sediment	2013-03	Malaysia: Tioman Island, Pahang	cell culture	Prof Annie Tan, Universiti Malaya	Microbe, viral or environmental		http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR372/007/SRR37273407/SRR37273407_1.fastq.gz			era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR372/007/SRR37273407/SRR37273407_1.fastq.gz	
      +SRR37273408	SRP677941	Complete genome assembly of the first Streptomyces acidocola isolate from Malaysia using Nanopore long-read sequencing and Illumina polishing	SRX32206377	Illumina DNA-Seq of Streptomyces acidicola	Illumina DNA-Seq of Streptomyces acidicola	2596892	Streptomyces acidicola	TPS3_Illumina	WGS	GENOMIC	RANDOM	PAIRED	SRS28131937		SAMN55411282	PRJNA1425250	Illumina MiSeq	Illumina MiSeq	ILLUMINA	2935209	1013212413	2935209	1472053376	TPS3_S1_L001_R1_001.fastq.gz	SRR37273408.lite	https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos10/sra-pub-zq-1002/SRR037/37273/SRR37273408/SRR37273408.lite.1	376905109	2026-02-18 07:15:02	de5fc51b2d7f6f455ec7659b0f3467fc	1	SRA Lite	Primary ETL	1	s3://sra-pub-zq-5/SRR37273408/SRR37273408.lite.1	s3.us-east-1	aws identity	https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos10/sra-pub-zq-1002/SRR037/37273/SRR37273408/SRR37273408.lite.1	worldwide	anonymous	gs://sra-pub-zq-109/SRR37273408/SRR37273408.lite.1	gs.us-east1	gcp identity		TPS3	marine sediment	2013-03	Malaysia: Tioman Island, Pahang	cell culture	Prof Annie Tan, Universiti Malaya	Microbe, viral or environmental		http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR372/008/SRR37273408/SRR37273408_1.fastq.gz	http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR372/008/SRR37273408/SRR37273408_2.fastq.gz		era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR372/008/SRR37273408/SRR37273408_1.fastq.gz	era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR372/008/SRR37273408/SRR37273408_2.fastq.gz
      
      

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: BioProject IDs:

        • step_state: scheduled
      • Step 2: --assay (metadata download):

        • step_state: scheduled
      • Step 3: --desc (metadata download):

        • step_state: scheduled
      • Step 4: --detailed (metadata download):

        • step_state: scheduled
      • Step 5: --expand (metadata download):

        • step_state: scheduled
      • Step 6: Group by Experiments (fastq download):

        • step_state: scheduled
      • Step 7: Group by Sample (fastq download):

        • step_state: scheduled
      • Step 8: Separe BioProject IDs (toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/python:3.5--2

            Command Line:

            • mkdir ./out && python '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/2dae863c8f42/split_file_to_collection/split_file_to_collection.py' --out ./out --in '/tmp/tmpabiru3i0/files/2/4/0/dataset_24064dad-8e03-4536-a14a-fb53fe6a37c3.dat' --ftype 'txt' --chunksize 1 --file_names 'split_file' --file_ext 'txt'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "txt"
              __workflow_invocation_uuid__ "ff55b0ee278211f1a1ae70a8a56e7439"
              chromInfo "/tmp/tmpabiru3i0/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              split_parms {"__current_case__": 5, "input": {"values": [{"id": 9, "src": "hda"}]}, "newfilenames": "split_file", "select_allocate": {"__current_case__": 2, "allocate": "byrow"}, "select_ftype": "txt", "select_mode": {"__current_case__": 0, "chunksize": "1", "mode": "chunk"}}
      • Step 9: Add BioProject IDs as parameters (param_value_from_file):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • cd ../; python _evaluate_expression_.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "ff55b0ee278211f1a1ae70a8a56e7439"
              chromInfo "/tmp/tmpabiru3i0/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              param_type "text"
              remove_newlines true
          • Job 2:

            • Job state is ok

            Command Line:

            • cd ../; python _evaluate_expression_.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "ff55b0ee278211f1a1ae70a8a56e7439"
              chromInfo "/tmp/tmpabiru3i0/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              param_type "text"
              remove_newlines true
      • Step 10: Metadata From BioProject IDs (toolshed.g2.bx.psu.edu/repos/iuc/pysradb_search/pysradb_search/2.5.1+galaxy0):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-e62c45964731bf241efeedb78776ebc093302f62:3c386467fc54c7b7a8da30b0705408fd927d49c0-0

            Command Line:

            • pysradb metadata 'PRJNA1425250' --saveto metadata_output.tsv   --detailed   && pysradb --version

            Exit Code:

            • 0

            Standard Output:

            • pysradb 2.5.1
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "ff55b0ee278211f1a1ae70a8a56e7439"
              chromInfo "/tmp/tmpabiru3i0/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              conditional_subcommand {"__current_case__": 1, "assay": false, "desc": false, "detailed": true, "expand": false, "prj_id": "PRJNA1425250", "selector": "metadata"}
              dbkey "?"
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-e62c45964731bf241efeedb78776ebc093302f62:3c386467fc54c7b7a8da30b0705408fd927d49c0-0

            Command Line:

            • pysradb metadata 'PRJNA1417618' --saveto metadata_output.tsv   --detailed   && pysradb --version

            Exit Code:

            • 0

            Standard Output:

            • pysradb 2.5.1
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "ff55b0ee278211f1a1ae70a8a56e7439"
              chromInfo "/tmp/tmpabiru3i0/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              conditional_subcommand {"__current_case__": 1, "assay": false, "desc": false, "detailed": true, "expand": false, "prj_id": "PRJNA1417618", "selector": "metadata"}
              dbkey "?"
      • Step 11: Run IDs extract (toolshed.g2.bx.psu.edu/repos/iuc/table_compute/table_compute/1.2.4+galaxy2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-344874846f44224e5f0b7b741eacdddffe895d1e:d3fff24ee1297b4c3bcef48354c2a30f0c82007a-2

            Command Line:

            • cp '/tmp/tmpabiru3i0/job_working_directory/000/13/configs/tmpqj0rvf_x' ./userconfig.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/safety.py' ./safety.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/table_compute.py' ./table_compute.py && python ./table_compute.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "tsv"
              __workflow_invocation_uuid__ "ff55b0ee278211f1a1ae70a8a56e7439"
              chromInfo "/tmp/tmpabiru3i0/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              out_opts None
              precision "6"
              singtabop {"__current_case__": 0, "adv": {"header": null, "nrows": null, "skip_blank_lines": true, "skipfooter": null}, "col_row_names": ["has_col_names"], "input": {"values": [{"id": 16, "src": "dce"}]}, "use_type": "single", "user": {"__current_case__": 1, "mode": "select", "select_cols_wanted": "1", "select_keepdupe": null, "select_rows_wanted": null}}
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-344874846f44224e5f0b7b741eacdddffe895d1e:d3fff24ee1297b4c3bcef48354c2a30f0c82007a-2

            Command Line:

            • cp '/tmp/tmpabiru3i0/job_working_directory/000/14/configs/tmp01m_cuuh' ./userconfig.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/safety.py' ./safety.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/table_compute.py' ./table_compute.py && python ./table_compute.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "tsv"
              __workflow_invocation_uuid__ "ff55b0ee278211f1a1ae70a8a56e7439"
              chromInfo "/tmp/tmpabiru3i0/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              out_opts None
              precision "6"
              singtabop {"__current_case__": 0, "adv": {"header": null, "nrows": null, "skip_blank_lines": true, "skipfooter": null}, "col_row_names": ["has_col_names"], "input": {"values": [{"id": 17, "src": "dce"}]}, "use_type": "single", "user": {"__current_case__": 1, "mode": "select", "select_cols_wanted": "1", "select_keepdupe": null, "select_rows_wanted": null}}
      • Step 12: Sequences Download (toolshed.g2.bx.psu.edu/repos/iuc/fastq_dl/fastq_dl/3.0.1+galaxy2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/fastq-dl:3.0.1--pyhdfd78af_0

            Command Line:

            • mkdir -p single-end paired-end logs && mapfile -t accessionsarr < "/tmp/tmpabiru3i0/files/f/3/3/dataset_f3357cc6-2d85-475d-ae3a-a7ee187244e8.dat" &&  for accessionid in "${accessionsarr[@]}"; do fastq-dl --accession "$accessionid" --provider ena --only-provider   ; exit_code=$? ; if [ $exit_code -ne 0 ]; then echo "fastq-dl failed for accession: ${accessionid}" >&2 ; exit $exit_code ; break ; else mv fastq-run-info.tsv logs/"$accessionid"-fastq-run-info.tsv > /dev/null 2>&1 || true; fi ; done  && find . -maxdepth 1 -name "*_1.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_1/_forward/")"' {} \; && find . -maxdepth 1 -name "*_2.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_2/_reverse/")"' {} \; && find . -maxdepth 1 -name "*_R1.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_R1/_forward/")"' {} \; && find . -maxdepth 1 -name "*_R2.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_R2/_reverse/")"' {} \; && mv *.gz single-end > /dev/null 2>&1 || true

            Exit Code:

            • 0

            Standard Error:

            • 2026-03-24 13:12:50 INFO     2026-03-24 13:12:50:root:INFO -     download.py:189
                                           Query: SRR37273407                                 
                                  INFO     2026-03-24 13:12:50:root:INFO -     download.py:190
                                           Archive: ena                                       
                                  INFO     2026-03-24 13:12:50:root:INFO -     download.py:195
                                           Total Runs To Download: 1                          
                                  INFO     2026-03-24 13:12:50:root:INFO -     download.py:214
                                           Working on run SRR37273407...                      
                                  INFO     2026-03-24 13:12:50:root:INFO -          ena.py:167
                                           /tmp/tmpabiru3i0/job_working_directory/0           
                                           00/15/working/SRR37273407_1.fastq.gz FTP           
                                           download attempt 1                                 
              2026-03-24 13:13:18 INFO     2026-03-24 13:13:18:root:INFO -          ena.py:195
                                           Successfully downloaded                            
                                           /tmp/tmpabiru3i0/job_working_directory/0           
                                           00/15/working/SRR37273407_1.fastq.gz               
                                  INFO     2026-03-24 13:13:18:root:INFO -     download.py:311
                                           Writing metadata to                                
                                           /tmp/tmpabiru3i0/job_working_direct                
                                           ory/000/15/working/fastq-run-info.t                
                                           sv                                                 
              2026-03-24 13:13:21 INFO     2026-03-24 13:13:21:root:INFO -     download.py:189
                                           Query: SRR37273408                                 
                                  INFO     2026-03-24 13:13:21:root:INFO -     download.py:190
                                           Archive: ena                                       
                                  INFO     2026-03-24 13:13:21:root:INFO -     download.py:195
                                           Total Runs To Download: 1                          
                                  INFO     2026-03-24 13:13:21:root:INFO -     download.py:214
                                           Working on run SRR37273408...                      
                                  INFO     2026-03-24 13:13:21:root:INFO -          ena.py:167
                                           /tmp/tmpabiru3i0/job_working_directory/0           
                                           00/15/working/SRR37273408_1.fastq.gz FTP           
                                           download attempt 1                                 
              2026-03-24 13:15:40 INFO     2026-03-24 13:15:40:root:INFO -          ena.py:195
                                           Successfully downloaded                            
                                           /tmp/tmpabiru3i0/job_working_directory/0           
                                           00/15/working/SRR37273408_1.fastq.gz               
                                  INFO     2026-03-24 13:15:40:root:INFO -          ena.py:167
                                           /tmp/tmpabiru3i0/job_working_directory/0           
                                           00/15/working/SRR37273408_2.fastq.gz FTP           
                                           download attempt 1                                 
              2026-03-24 13:18:18 INFO     2026-03-24 13:18:18:root:INFO -          ena.py:195
                                           Successfully downloaded                            
                                           /tmp/tmpabiru3i0/job_working_directory/0           
                                           00/15/working/SRR37273408_2.fastq.gz               
                                  INFO     2026-03-24 13:18:18:root:INFO -     download.py:311
                                           Writing metadata to                                
                                           /tmp/tmpabiru3i0/job_working_direct                
                                           ory/000/15/working/fastq-run-info.t                
                                           sv                                                 
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "ff55b0ee278211f1a1ae70a8a56e7439"
              chromInfo "/tmp/tmpabiru3i0/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              group_by_experiment false
              group_by_sample false
              input_type {"__current_case__": 1, "accessions_file": {"values": [{"id": 18, "src": "dce"}]}, "select_input_type": "accessions_list"}
              only_download_metadata false
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/fastq-dl:3.0.1--pyhdfd78af_0

            Command Line:

            • mkdir -p single-end paired-end logs && mapfile -t accessionsarr < "/tmp/tmpabiru3i0/files/a/c/5/dataset_ac5f9edf-a538-4667-af4e-85c0f93dac92.dat" &&  for accessionid in "${accessionsarr[@]}"; do fastq-dl --accession "$accessionid" --provider ena --only-provider   ; exit_code=$? ; if [ $exit_code -ne 0 ]; then echo "fastq-dl failed for accession: ${accessionid}" >&2 ; exit $exit_code ; break ; else mv fastq-run-info.tsv logs/"$accessionid"-fastq-run-info.tsv > /dev/null 2>&1 || true; fi ; done  && find . -maxdepth 1 -name "*_1.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_1/_forward/")"' {} \; && find . -maxdepth 1 -name "*_2.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_2/_reverse/")"' {} \; && find . -maxdepth 1 -name "*_R1.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_R1/_forward/")"' {} \; && find . -maxdepth 1 -name "*_R2.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_R2/_reverse/")"' {} \; && mv *.gz single-end > /dev/null 2>&1 || true

            Exit Code:

            • 0

            Standard Error:

            • 2026-03-24 13:18:30 INFO     2026-03-24 13:18:30:root:INFO -     download.py:189
                                           Query: SRR37073390                                 
                                  INFO     2026-03-24 13:18:30:root:INFO -     download.py:190
                                           Archive: ena                                       
                                  INFO     2026-03-24 13:18:30:root:INFO -     download.py:195
                                           Total Runs To Download: 1                          
                                  INFO     2026-03-24 13:18:30:root:INFO -     download.py:214
                                           Working on run SRR37073390...                      
                                  INFO     2026-03-24 13:18:30:root:INFO -          ena.py:167
                                           /tmp/tmpabiru3i0/job_working_directory/0           
                                           00/16/working/SRR37073390_1.fastq.gz FTP           
                                           download attempt 1                                 
              2026-03-24 13:22:04 INFO     2026-03-24 13:22:04:root:INFO -          ena.py:195
                                           Successfully downloaded                            
                                           /tmp/tmpabiru3i0/job_working_directory/0           
                                           00/16/working/SRR37073390_1.fastq.gz               
                                  INFO     2026-03-24 13:22:04:root:INFO -          ena.py:167
                                           /tmp/tmpabiru3i0/job_working_directory/0           
                                           00/16/working/SRR37073390_2.fastq.gz FTP           
                                           download attempt 1                                 
              2026-03-24 13:24:48 INFO     2026-03-24 13:24:48:root:INFO -          ena.py:195
                                           Successfully downloaded                            
                                           /tmp/tmpabiru3i0/job_working_directory/0           
                                           00/16/working/SRR37073390_2.fastq.gz               
                                  INFO     2026-03-24 13:24:48:root:INFO -     download.py:311
                                           Writing metadata to                                
                                           /tmp/tmpabiru3i0/job_working_direct                
                                           ory/000/16/working/fastq-run-info.t                
                                           sv                                                 
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "ff55b0ee278211f1a1ae70a8a56e7439"
              chromInfo "/tmp/tmpabiru3i0/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              group_by_experiment false
              group_by_sample false
              input_type {"__current_case__": 1, "accessions_file": {"values": [{"id": 19, "src": "dce"}]}, "select_input_type": "accessions_list"}
              only_download_metadata false
    • Other invocation details
      • history_id

        • 5a6a5751b6a153f4
      • history_state

        • ok
      • invocation_id

        • 5a6a5751b6a153f4
      • invocation_state

        • scheduled
      • workflow_id

        • 5a6a5751b6a153f4
Passed Tests
  • ✅ metadata-and-sequences-from-BioProjectIDs.ga_0

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: BioProject IDs:

        • step_state: scheduled
      • Step 2: --assay (metadata download):

        • step_state: scheduled
      • Step 3: --desc (metadata download):

        • step_state: scheduled
      • Step 4: --detailed (metadata download):

        • step_state: scheduled
      • Step 5: --expand (metadata download):

        • step_state: scheduled
      • Step 6: Group by Experiments (fastq download):

        • step_state: scheduled
      • Step 7: Group by Sample (fastq download):

        • step_state: scheduled
      • Step 8: Separe BioProject IDs (toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/python:3.5--2

            Command Line:

            • mkdir ./out && python '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/2dae863c8f42/split_file_to_collection/split_file_to_collection.py' --out ./out --in '/tmp/tmpabiru3i0/files/6/5/0/dataset_650eb191-696e-426f-8905-82b86ce254eb.dat' --ftype 'txt' --chunksize 1 --file_names 'split_file' --file_ext 'txt'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "tabular"
              __workflow_invocation_uuid__ "7108edcc277f11f1a1ae70a8a56e7439"
              chromInfo "/tmp/tmpabiru3i0/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              split_parms {"__current_case__": 5, "input": {"values": [{"id": 1, "src": "hda"}]}, "newfilenames": "split_file", "select_allocate": {"__current_case__": 2, "allocate": "byrow"}, "select_ftype": "txt", "select_mode": {"__current_case__": 0, "chunksize": "1", "mode": "chunk"}}
      • Step 9: Add BioProject IDs as parameters (param_value_from_file):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • cd ../; python _evaluate_expression_.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "7108edcc277f11f1a1ae70a8a56e7439"
              chromInfo "/tmp/tmpabiru3i0/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              param_type "text"
              remove_newlines true
      • Step 10: Metadata From BioProject IDs (toolshed.g2.bx.psu.edu/repos/iuc/pysradb_search/pysradb_search/2.5.1+galaxy0):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-e62c45964731bf241efeedb78776ebc093302f62:3c386467fc54c7b7a8da30b0705408fd927d49c0-0

            Command Line:

            • pysradb metadata 'PRJNA1417618' --saveto metadata_output.tsv   --detailed   && pysradb --version

            Exit Code:

            • 0

            Standard Output:

            • pysradb 2.5.1
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "7108edcc277f11f1a1ae70a8a56e7439"
              chromInfo "/tmp/tmpabiru3i0/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              conditional_subcommand {"__current_case__": 1, "assay": false, "desc": false, "detailed": true, "expand": false, "prj_id": "PRJNA1417618", "selector": "metadata"}
              dbkey "?"
      • Step 11: Run IDs extract (toolshed.g2.bx.psu.edu/repos/iuc/table_compute/table_compute/1.2.4+galaxy2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-344874846f44224e5f0b7b741eacdddffe895d1e:d3fff24ee1297b4c3bcef48354c2a30f0c82007a-2

            Command Line:

            • cp '/tmp/tmpabiru3i0/job_working_directory/000/5/configs/tmpbv_6j9a9' ./userconfig.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/safety.py' ./safety.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/table_compute.py' ./table_compute.py && python ./table_compute.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "tsv"
              __workflow_invocation_uuid__ "7108edcc277f11f1a1ae70a8a56e7439"
              chromInfo "/tmp/tmpabiru3i0/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              out_opts None
              precision "6"
              singtabop {"__current_case__": 0, "adv": {"header": null, "nrows": null, "skip_blank_lines": true, "skipfooter": null}, "col_row_names": ["has_col_names"], "input": {"values": [{"id": 3, "src": "dce"}]}, "use_type": "single", "user": {"__current_case__": 1, "mode": "select", "select_cols_wanted": "1", "select_keepdupe": null, "select_rows_wanted": null}}
      • Step 12: Sequences Download (toolshed.g2.bx.psu.edu/repos/iuc/fastq_dl/fastq_dl/3.0.1+galaxy2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/fastq-dl:3.0.1--pyhdfd78af_0

            Command Line:

            • mkdir -p single-end paired-end logs && mapfile -t accessionsarr < "/tmp/tmpabiru3i0/files/8/e/f/dataset_8ef8b367-6528-49e8-ae21-816f4d9f5d98.dat" &&  for accessionid in "${accessionsarr[@]}"; do fastq-dl --accession "$accessionid" --provider ena --only-provider   ; exit_code=$? ; if [ $exit_code -ne 0 ]; then echo "fastq-dl failed for accession: ${accessionid}" >&2 ; exit $exit_code ; break ; else mv fastq-run-info.tsv logs/"$accessionid"-fastq-run-info.tsv > /dev/null 2>&1 || true; fi ; done  && find . -maxdepth 1 -name "*_1.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_1/_forward/")"' {} \; && find . -maxdepth 1 -name "*_2.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_2/_reverse/")"' {} \; && find . -maxdepth 1 -name "*_R1.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_R1/_forward/")"' {} \; && find . -maxdepth 1 -name "*_R2.fastq.gz" -exec bash -c 'mv "$0" "paired-end/$(basename "$0" | sed "s/_R2/_reverse/")"' {} \; && mv *.gz single-end > /dev/null 2>&1 || true

            Exit Code:

            • 0

            Standard Error:

            • 2026-03-24 12:48:32 INFO     2026-03-24 12:48:32:root:INFO -     download.py:189
                                           Query: SRR37073390                                 
                                  INFO     2026-03-24 12:48:32:root:INFO -     download.py:190
                                           Archive: ena                                       
                                  INFO     2026-03-24 12:48:32:root:INFO -     download.py:195
                                           Total Runs To Download: 1                          
                                  INFO     2026-03-24 12:48:32:root:INFO -     download.py:214
                                           Working on run SRR37073390...                      
                                  INFO     2026-03-24 12:48:32:root:INFO -          ena.py:167
                                           /tmp/tmpabiru3i0/job_working_directory/0           
                                           00/6/working/SRR37073390_1.fastq.gz FTP            
                                           download attempt 1                                 
              2026-03-24 12:50:18 INFO     2026-03-24 12:50:18:root:INFO -          ena.py:195
                                           Successfully downloaded                            
                                           /tmp/tmpabiru3i0/job_working_directory/0           
                                           00/6/working/SRR37073390_1.fastq.gz                
                                  INFO     2026-03-24 12:50:18:root:INFO -          ena.py:167
                                           /tmp/tmpabiru3i0/job_working_directory/0           
                                           00/6/working/SRR37073390_2.fastq.gz FTP            
                                           download attempt 1                                 
              2026-03-24 13:10:37 INFO     2026-03-24 13:10:37:root:INFO -          ena.py:195
                                           Successfully downloaded                            
                                           /tmp/tmpabiru3i0/job_working_directory/0           
                                           00/6/working/SRR37073390_2.fastq.gz                
                                  INFO     2026-03-24 13:10:37:root:INFO -     download.py:311
                                           Writing metadata to                                
                                           /tmp/tmpabiru3i0/job_working_direct                
                                           ory/000/6/working/fastq-run-info.ts                
                                           v                                                  
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "7108edcc277f11f1a1ae70a8a56e7439"
              chromInfo "/tmp/tmpabiru3i0/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              group_by_experiment false
              group_by_sample false
              input_type {"__current_case__": 1, "accessions_file": {"values": [{"id": 4, "src": "dce"}]}, "select_input_type": "accessions_list"}
              only_download_metadata false
    • Other invocation details
      • history_id

        • 9a7ddd52088ca260
      • history_state

        • ok
      • invocation_id

        • 9a7ddd52088ca260
      • invocation_state

        • scheduled
      • workflow_id

        • 5a6a5751b6a153f4

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Galaxy workflow that downloads SRA metadata tables and FASTQ collections starting from a list of BioProject IDs, along with packaging/registry metadata and workflow tests.

Changes:

  • Added the metadata-and-sequences-from-BioProjectIDs Galaxy workflow using pysradb search + fastq-dl.
  • Added Dockstore/WorkflowHub configuration plus README and changelog for the new workflow.
  • Added workflow tests and associated test-data fixtures (BioProject ID lists, expected metadata TSVs, and FASTQ snippets).

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
workflows/data-fetching/metadata-and-sequences-from-BioProjectIDs/metadata-and-sequences-from-BioProjectIDs.ga New Galaxy workflow definition (inputs/steps/outputs).
workflows/data-fetching/metadata-and-sequences-from-BioProjectIDs/metadata-and-sequences-from-BioProjectIDs-tests.yml New workflow test cases validating metadata + FASTQ outputs.
workflows/data-fetching/metadata-and-sequences-from-BioProjectIDs/README.md Workflow documentation (purpose, inputs, outputs).
workflows/data-fetching/metadata-and-sequences-from-BioProjectIDs/CHANGELOG.md Initial changelog entry for the workflow.
workflows/data-fetching/metadata-and-sequences-from-BioProjectIDs/.workflowhub.yml WorkflowHub publishing configuration for the workflow.
workflows/data-fetching/metadata-and-sequences-from-BioProjectIDs/.dockstore.yml Dockstore descriptor pointing to the workflow and test definitions.
workflows/data-fetching/metadata-and-sequences-from-BioProjectIDs/test-data/test1_single_prj_pe.txt Test input BioProject ID list (single project).
workflows/data-fetching/metadata-and-sequences-from-BioProjectIDs/test-data/test1_metadata_file_split_file_000000.txt.tsv Expected metadata TSV for test 1.
workflows/data-fetching/metadata-and-sequences-from-BioProjectIDs/test-data/test1_paired_end_collection_forward.fastq Expected FASTQ snippet for test 1 (forward).
workflows/data-fetching/metadata-and-sequences-from-BioProjectIDs/test-data/test1_paired_end_collection_reverse.fastq Expected FASTQ snippet for test 1 (reverse).
workflows/data-fetching/metadata-and-sequences-from-BioProjectIDs/test-data/test2_multiple_prj_mixed.txt Test input BioProject ID list (multiple projects).
workflows/data-fetching/metadata-and-sequences-from-BioProjectIDs/test-data/test2_metadata_file_split_file_000000.txt.tsv Expected metadata TSV for test 2 (project 1).
workflows/data-fetching/metadata-and-sequences-from-BioProjectIDs/test-data/test2_metadata_file_split_file_000001.txt.tsv Expected metadata TSV for test 2 (project 2).
workflows/data-fetching/metadata-and-sequences-from-BioProjectIDs/test-data/test2_SRR37273407_forward.fastq Expected FASTQ snippet for test 2 (single-end/forward-only run).
workflows/data-fetching/metadata-and-sequences-from-BioProjectIDs/test-data/test2_SRR37273408_forward.fastq Expected FASTQ snippet for test 2 (paired-end forward).
workflows/data-fetching/metadata-and-sequences-from-BioProjectIDs/test-data/test2_SRR37273408_reverse.fastq Expected FASTQ snippet for test 2 (paired-end reverse).
workflows/data-fetching/metadata-and-sequences-from-BioProjectIDs/test-data/test2_SRR37073390_forward.fastq Expected FASTQ snippet for test 2 (paired-end forward).
workflows/data-fetching/metadata-and-sequences-from-BioProjectIDs/test-data/test2_SRR37073390_reverse.fastq Expected FASTQ snippet for test 2 (paired-end reverse).
.idea/.gitignore Adds JetBrains IDE ignore rules within a committed .idea/ directory.
Files not reviewed (1)
  • .idea/.gitignore: Language not supported

Comment on lines +6 to +8
primaryDescriptorPath: /metadata-and-sequences-from-BioProjectIDs.ga
testParameterFiles:
- /metadata-and-sequences-from-BioProjectIDs-tests.yml
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The referenced workflow/test filenames in primaryDescriptorPath / testParameterFiles include uppercase letters and “BioProjectIDs” without a space. For new IWC workflow additions, filenames/folder names are expected to be lowercase with dashes and human-readable wording (e.g., ...-bioproject-ids...). Consider renaming the workflow and test files (and updating these paths) to match that convention.

Copilot generated this review using guidance from repository custom instructions.
@@ -0,0 +1,517 @@
{
"a_galaxy_workflow": "true",
"annotation": "This workflow takes BioProject IDs as input and is able to retrieve SRA tables and FASTQ files from them.",
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow annotation does not follow the repository’s required phrasing (it should start with “This workflow does/runs/performs …” and ideally mention the main outputs). Please reword this annotation to match the expected format.

Suggested change
"annotation": "This workflow takes BioProject IDs as input and is able to retrieve SRA tables and FASTQ files from them.",
"annotation": "This workflow performs retrieval of SRA metadata tables and FASTQ sequence files from input BioProject IDs.",

Copilot uses AI. Check for mistakes.
Comment on lines +109 to +115
{
"description": "Enable it to display detailed metadata table",
"name": "--detailed (metadata download)"
}
],
"label": "--detailed (metadata download)",
"name": "Input parameter",
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow input label/name (--detailed (metadata download)) is CLI-flag styled and not human-readable. Please rename it to a descriptive label and update the corresponding key in *-tests.yml accordingly.

Copilot generated this review using guidance from repository custom instructions.
BioProject IDs:
class: File
path: test-data/test1_single_prj_pe.txt
filetype: tabular
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test 1 declares the BioProject ID list input as filetype: tabular, while Test 2 uses filetype: txt. Given this input is a plain list of IDs (no header), consider using txt consistently across tests (and align with the workflow input datatype).

Suggested change
filetype: tabular
filetype: txt

Copilot uses AI. Check for mistakes.
registries:
- url: https://workflowhub.eu
project: iwc
workflow: metadata-and-sequences-from-BioProjectsIDs/main
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WorkflowHub workflow: path has a typo/inconsistency (metadata-and-sequences-from-BioProjectsIDs/main): it doesn’t match this directory name (metadata-and-sequences-from-BioProjectIDs) and will likely break publication/registration. Please update it to the correct (and ideally lowercase) workflow slug.

Suggested change
workflow: metadata-and-sequences-from-BioProjectsIDs/main
workflow: metadata-and-sequences-from-bioprojectids/main

Copilot uses AI. Check for mistakes.
Comment thread .idea/.gitignore Outdated
Comment on lines +1 to +8
# Default ignored files
/shelf/
/workspace.xml
# Editor-based HTTP Client requests
/httpRequests/
# Datasource local storage ignored files
/dataSources/
/dataSources.local.xml
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repository-level .gitignore does not currently ignore JetBrains .idea/, and this PR adds .idea/.gitignore. Project-specific IDE metadata generally shouldn’t be committed; instead, remove the .idea/ directory from the repo and add .idea/ to the top-level .gitignore if needed.

Suggested change
# Default ignored files
/shelf/
/workspace.xml
# Editor-based HTTP Client requests
/httpRequests/
# Datasource local storage ignored files
/dataSources/
/dataSources.local.xml
# Ignore all JetBrains IDE project files in this directory
/*
!.gitignore

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +16
"name": "Metadata and Sequences from BioProjectIDs",
"readme": "# Metadata and Sequences from BioProjectIDs\n\n## Rationale\nIn order to promote re-analysis of publicly available sequences from INSDC databases, we propose *Metadata and Sequences from BioProjectIDs* a Galaxy workflow that starting by a list of valid BioProject IDs (e.g. PRJNA....., PRJEB.....) is able to manage data and metadata download.\n\n## Usage\nUpload a text file in which there is a BioProject ID for each row and run the workflow.\n\nThere is also the possibility to set optional options to regulate behaviour of metadata and data download.\n",
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow name uses “BioProjectIDs” without a space, which is not human-readable and is inconsistent with the “BioProject IDs” wording used elsewhere. Consider renaming to use spaces (e.g., “BioProject IDs”) and ensure related filenames/folder name follow the same convention.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +55 to +61
{
"description": "Enable it to include assay type in output",
"name": "--assay (metadata download)"
}
],
"label": "--assay (metadata download)",
"name": "Input parameter",
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow input label/name (--assay (metadata download)) is CLI-flag styled and not human-readable. Please rename it to a descriptive label (e.g., “Include assay type in metadata”) and update the corresponding key in *-tests.yml accordingly.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +376 to +381
"workflow_outputs": [
{
"label": "metadata_file",
"output_name": "metadata_file",
"uuid": "936ed258-6b4a-412c-9c61-b475d5da1251"
}
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow output label metadata_file is not human-readable (underscores) and won’t render nicely in Galaxy or in test definitions. Please change workflow_outputs[].label to a human-readable phrase and then update the matching output name used in metadata-and-sequences-from-BioProjectIDs-tests.yml.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +500 to +510
"workflow_outputs": [
{
"label": "paired_end_collection",
"output_name": "paired_end_collection",
"uuid": "d08a8412-887b-4800-a200-916745f7c65e"
},
{
"label": "single_end_collection",
"output_name": "single_end_collection",
"uuid": "9deedcc0-92ff-4845-8cfc-a9a1e70d9375"
}
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow output labels paired_end_collection / single_end_collection are not human-readable (underscores). Please rename these workflow_outputs[].label values to human-readable phrases and update the corresponding output keys in metadata-and-sequences-from-BioProjectIDs-tests.yml to match exactly.

Copilot generated this review using guidance from repository custom instructions.
@mvdbeek
Copy link
Copy Markdown
Member

mvdbeek commented Mar 27, 2026

Thanks @gdefazio, this seems quite useful. I note that there is some overlap with https://iwc.galaxyproject.org/workflow/sra-manifest-to-concatenated-fastqs-main/ and https://iwc.galaxyproject.org/workflow/parallel-accession-download-main/. Could you use either of these workflows as a subworkflow to handle the downloads based on the manifest you're generating ?

@gdefazio
Copy link
Copy Markdown
Author

Thanks @gdefazio, this seems quite useful. I note that there is some overlap with https://iwc.galaxyproject.org/workflow/sra-manifest-to-concatenated-fastqs-main/ and https://iwc.galaxyproject.org/workflow/parallel-accession-download-main/. Could you use either of these workflows as a subworkflow to handle the downloads based on the manifest you're generating ?

Hi @mvdbeek and thank you for your revision and comment. I'm new on galaxy wf implementation. For your question I need some days to better understand how your tips may be useful for this wf. Probably parallelization may be useful but please let me have an in deep look.
Thanks a lot.

@gdefazio
Copy link
Copy Markdown
Author

gdefazio commented Apr 8, 2026

Hi @mvdbeek, I tried to integrate parallel-accession-download-main but I had lot of problems with scheduling of job for apply-rule tool of this wf because of too nested structure. Then I integrated fasterq-dump in my wf and I think that thanks to your suggestion now the wf is better. I'm waiting for CI start from yesterday but it seems stacked.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 2
Passed 0
Error 0
Failure 2
Skipped 0
Failed Tests
  • ❌ metadata-and-sequences-from-bioproject-ids.ga_0

    Problems:

    • Output collection 'PE output': failed to find identifier 'split_file_000000.txt' in the tool generated elements ['SRR37073390']
      
    • Output collection 'SE output': failed to find identifier 'split_file_000000.txt' in the tool generated elements []
      

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: BioProject IDs:

        • step_state: scheduled
      • Step 2: assay (metadata download):

        • step_state: scheduled
      • Step 3: desc (metadata download):

        • step_state: scheduled
      • Step 4: detailed (metadata download):

        • step_state: scheduled
      • Step 5: expand (metadata download):

        • step_state: scheduled
      • Step 6: Separate BioProject IDs (toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/python:3.5--2

            Command Line:

            • mkdir ./out && python '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/2dae863c8f42/split_file_to_collection/split_file_to_collection.py' --out ./out --in '/tmp/tmpz_szomsr/files/4/a/8/dataset_4a855306-4b4e-465e-80f5-1ee96c06d496.dat' --ftype 'txt' --chunksize 1 --file_names 'split_file' --file_ext 'txt'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "txt"
              __workflow_invocation_uuid__ "069c75a8338711f19acb7c1e524174c7"
              chromInfo "/tmp/tmpz_szomsr/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              split_parms {"__current_case__": 5, "input": {"values": [{"id": 1, "src": "hda"}]}, "newfilenames": "split_file", "select_allocate": {"__current_case__": 2, "allocate": "byrow"}, "select_ftype": "txt", "select_mode": {"__current_case__": 0, "chunksize": "1", "mode": "chunk"}}
      • Step 7: Add BioProject IDs as parameters (param_value_from_file):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • cd ../; python _evaluate_expression_.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "069c75a8338711f19acb7c1e524174c7"
              chromInfo "/tmp/tmpz_szomsr/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              param_type "text"
              remove_newlines true
      • Step 8: Metadata From BioProject IDs (toolshed.g2.bx.psu.edu/repos/iuc/pysradb_search/pysradb_search/2.5.1+galaxy0):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-e62c45964731bf241efeedb78776ebc093302f62:3c386467fc54c7b7a8da30b0705408fd927d49c0-0

            Command Line:

            • pysradb metadata 'PRJNA1417618' --saveto metadata_output.tsv   --detailed   && pysradb --version

            Exit Code:

            • 0

            Standard Output:

            • pysradb 2.5.1
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "069c75a8338711f19acb7c1e524174c7"
              chromInfo "/tmp/tmpz_szomsr/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              conditional_subcommand {"__current_case__": 1, "assay": false, "desc": false, "detailed": true, "expand": false, "prj_id": "PRJNA1417618", "selector": "metadata"}
              dbkey "?"
      • Step 9: Run IDs extract (toolshed.g2.bx.psu.edu/repos/iuc/table_compute/table_compute/1.2.4+galaxy2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-344874846f44224e5f0b7b741eacdddffe895d1e:d3fff24ee1297b4c3bcef48354c2a30f0c82007a-2

            Command Line:

            • cp '/tmp/tmpz_szomsr/job_working_directory/000/5/configs/tmpyet33220' ./userconfig.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/safety.py' ./safety.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/table_compute.py' ./table_compute.py && python ./table_compute.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "tsv"
              __workflow_invocation_uuid__ "069c75a8338711f19acb7c1e524174c7"
              chromInfo "/tmp/tmpz_szomsr/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              out_opts None
              precision "6"
              singtabop {"__current_case__": 0, "adv": {"header": null, "nrows": null, "skip_blank_lines": true, "skipfooter": null}, "col_row_names": ["has_col_names"], "input": {"values": [{"id": 3, "src": "dce"}]}, "use_type": "single", "user": {"__current_case__": 1, "mode": "select", "select_cols_wanted": "1", "select_keepdupe": null, "select_rows_wanted": null}}
      • Step 10: fasterq-dump (toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/3.1.1+galaxy1):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-2b04072095278721dc9a5772e61e406f399b6030:a95f0e0ff448eede323315668bfa8ee64c918ebb-0

            Command Line:

            • set -o | grep -q pipefail && set -o pipefail;  mkdir -p ~/.ncbi && cp '/tmp/tmpz_szomsr/job_working_directory/000/6/configs/tmpc6vs39sa' ~/.ncbi/user-settings.mkfg &&   export SRA_PREFETCH_RETRIES=3 && export SRA_PREFETCH_ATTEMPT=1 &&    grep '^[[:space:]]*[E|S|D]RR[0-9]\{1,\}[[:space:]]*$' '/tmp/tmpz_szomsr/files/1/f/7/dataset_1f7a6a71-9a74-46e6-b18f-965339b35316.dat' > accessions && for acc in $(cat ./accessions); do ( echo "Downloading accession: $acc..." &&  while [ $SRA_PREFETCH_ATTEMPT -le $SRA_PREFETCH_RETRIES ] ; do fasterq-dump "$acc" -e ${GALAXY_SLOTS:-1} -t ${TMPDIR} --seq-defline '@$ac.$sn/$ri' --qual-defline '+' --split-3 --skip-technical 2>&1 | tee -a '/tmp/tmpz_szomsr/job_working_directory/000/6/outputs/dataset_a776ab45-57cf-4112-acfd-bbc438d51eb3.dat'; if [ $? == 0 ] && [ $(ls *.fastq | wc -l) -ge 1 ]; then break ; else echo "Prefetch attempt $SRA_PREFETCH_ATTEMPT of $SRA_PREFETCH_RETRIES exited with code $?" ; SRA_PREFETCH_ATTEMPT=`expr $SRA_PREFETCH_ATTEMPT + 1` ; sleep 1 ; fi ; done && mkdir -p output && mkdir -p outputOther && count="$(ls *.fastq | wc -l)" && echo "There are $count fastq files" && data=($(ls *.fastq)) && if [ "$count" -eq 1 ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${data[0]}" > output/"${acc}"__single.fastqsanger.gz && rm "${data[0]}"; elif [ "--split-3" = "--split-3" ]; then if [ -e "${acc}".fastq ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${acc}".fastq > outputOther/"${acc}"__single.fastqsanger.gz; fi && pigz -cqp ${GALAXY_SLOTS:-1} "${acc}"_1.fastq > output/"${acc}"_forward.fastqsanger.gz && pigz -cqp ${GALAXY_SLOTS:-1} "${acc}"_2.fastq > output/"${acc}"_reverse.fastqsanger.gz && rm "${acc}"*.fastq; elif [ "$count" -eq 2 ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${data[0]}" > output/"${acc}"_forward.fastqsanger.gz && pigz -cqp ${GALAXY_SLOTS:-1} "${data[1]}" > output/"${acc}"_reverse.fastqsanger.gz && rm "${data[0]}" && rm "${data[1]}"; else for file in ${data[*]}; do pigz -cqp ${GALAXY_SLOTS:-1} "$file" > outputOther/"$file"sanger.gz && rm "$file"; done; fi;  ); done; echo "Done with all accessions."

            Exit Code:

            • 0

            Standard Output:

            • Downloading accession: SRR37073390...
              spots read      : 13,266,400
              reads read      : 26,532,800
              reads written   : 26,532,800
              There are 2 fastq files
              Done with all accessions.
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "069c75a8338711f19acb7c1e524174c7"
              adv {"minlen": null, "seq_defline": "@$ac.$sn/$ri", "skip_technical": true, "split": "--split-3"}
              chromInfo "/tmp/tmpz_szomsr/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              input {"__current_case__": 2, "file_list": {"values": [{"id": 4, "src": "dce"}]}, "input_select": "file_list"}
      • Step 11: PE collection regularization (__APPLY_RULES__):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __workflow_invocation_uuid__ "069c75a8338711f19acb7c1e524174c7"
              input {"values": [{"id": 5, "src": "hdca"}]}
              rules {"mapping": [{"collapsible_value": {"__class__": "RuntimeValue"}, "columns": [1], "connectable": true, "editing": false, "is_workflow": false, "type": "list_identifiers"}, {"collapsible_value": {"__class__": "RuntimeValue"}, "columns": [2], "connectable": true, "is_workflow": false, "type": "paired_identifier"}], "rules": [{"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier0", "warn": null}, {"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier1", "warn": null}, {"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier2", "warn": null}]}
      • Step 12: SE collection regularization (__APPLY_RULES__):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __workflow_invocation_uuid__ "069c75a8338711f19acb7c1e524174c7"
              input {"values": [{"id": 6, "src": "hdca"}]}
              rules {"mapping": [{"collapsible_value": {"__class__": "RuntimeValue"}, "columns": [1], "connectable": true, "editing": false, "is_workflow": false, "type": "list_identifiers"}], "rules": [{"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier0", "warn": null}, {"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier1", "warn": null}]}
    • Other invocation details
      • history_id

        • 86b054d6a33410bd
      • history_state

        • ok
      • invocation_id

        • 86b054d6a33410bd
      • invocation_state

        • scheduled
      • workflow_id

        • ce57138c0fb39786
  • ❌ metadata-and-sequences-from-bioproject-ids.ga_1

    Problems:

    • Output collection 'PE output': failed to find identifier 'elements' in the tool generated elements ['SRR37273408', 'SRR37073390']
      

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: BioProject IDs:

        • step_state: scheduled
      • Step 2: assay (metadata download):

        • step_state: scheduled
      • Step 3: desc (metadata download):

        • step_state: scheduled
      • Step 4: detailed (metadata download):

        • step_state: scheduled
      • Step 5: expand (metadata download):

        • step_state: scheduled
      • Step 6: Separate BioProject IDs (toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/python:3.5--2

            Command Line:

            • mkdir ./out && python '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/2dae863c8f42/split_file_to_collection/split_file_to_collection.py' --out ./out --in '/tmp/tmpz_szomsr/files/4/d/6/dataset_4d62222c-7cc4-4bf7-956b-cd0f70bed67b.dat' --ftype 'txt' --chunksize 1 --file_names 'split_file' --file_ext 'txt'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "txt"
              __workflow_invocation_uuid__ "bd64d292338911f19acb7c1e524174c7"
              chromInfo "/tmp/tmpz_szomsr/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              split_parms {"__current_case__": 5, "input": {"values": [{"id": 11, "src": "hda"}]}, "newfilenames": "split_file", "select_allocate": {"__current_case__": 2, "allocate": "byrow"}, "select_ftype": "txt", "select_mode": {"__current_case__": 0, "chunksize": "1", "mode": "chunk"}}
      • Step 7: Add BioProject IDs as parameters (param_value_from_file):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • cd ../; python _evaluate_expression_.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "bd64d292338911f19acb7c1e524174c7"
              chromInfo "/tmp/tmpz_szomsr/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              param_type "text"
              remove_newlines true
          • Job 2:

            • Job state is ok

            Command Line:

            • cd ../; python _evaluate_expression_.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "bd64d292338911f19acb7c1e524174c7"
              chromInfo "/tmp/tmpz_szomsr/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              param_type "text"
              remove_newlines true
      • Step 8: Metadata From BioProject IDs (toolshed.g2.bx.psu.edu/repos/iuc/pysradb_search/pysradb_search/2.5.1+galaxy0):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-e62c45964731bf241efeedb78776ebc093302f62:3c386467fc54c7b7a8da30b0705408fd927d49c0-0

            Command Line:

            • pysradb metadata 'PRJNA1425250' --saveto metadata_output.tsv   --detailed   && pysradb --version

            Exit Code:

            • 0

            Standard Output:

            • pysradb 2.5.1
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "bd64d292338911f19acb7c1e524174c7"
              chromInfo "/tmp/tmpz_szomsr/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              conditional_subcommand {"__current_case__": 1, "assay": false, "desc": false, "detailed": true, "expand": false, "prj_id": "PRJNA1425250", "selector": "metadata"}
              dbkey "?"
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-e62c45964731bf241efeedb78776ebc093302f62:3c386467fc54c7b7a8da30b0705408fd927d49c0-0

            Command Line:

            • pysradb metadata 'PRJNA1417618' --saveto metadata_output.tsv   --detailed   && pysradb --version

            Exit Code:

            • 0

            Standard Output:

            • pysradb 2.5.1
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "bd64d292338911f19acb7c1e524174c7"
              chromInfo "/tmp/tmpz_szomsr/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              conditional_subcommand {"__current_case__": 1, "assay": false, "desc": false, "detailed": true, "expand": false, "prj_id": "PRJNA1417618", "selector": "metadata"}
              dbkey "?"
      • Step 9: Run IDs extract (toolshed.g2.bx.psu.edu/repos/iuc/table_compute/table_compute/1.2.4+galaxy2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-344874846f44224e5f0b7b741eacdddffe895d1e:d3fff24ee1297b4c3bcef48354c2a30f0c82007a-2

            Command Line:

            • cp '/tmp/tmpz_szomsr/job_working_directory/000/15/configs/tmptv3bk39e' ./userconfig.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/safety.py' ./safety.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/table_compute.py' ./table_compute.py && python ./table_compute.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "tsv"
              __workflow_invocation_uuid__ "bd64d292338911f19acb7c1e524174c7"
              chromInfo "/tmp/tmpz_szomsr/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              out_opts None
              precision "6"
              singtabop {"__current_case__": 0, "adv": {"header": null, "nrows": null, "skip_blank_lines": true, "skipfooter": null}, "col_row_names": ["has_col_names"], "input": {"values": [{"id": 19, "src": "dce"}]}, "use_type": "single", "user": {"__current_case__": 1, "mode": "select", "select_cols_wanted": "1", "select_keepdupe": null, "select_rows_wanted": null}}
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-344874846f44224e5f0b7b741eacdddffe895d1e:d3fff24ee1297b4c3bcef48354c2a30f0c82007a-2

            Command Line:

            • cp '/tmp/tmpz_szomsr/job_working_directory/000/16/configs/tmpc9wqcwhj' ./userconfig.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/safety.py' ./safety.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/table_compute.py' ./table_compute.py && python ./table_compute.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "tsv"
              __workflow_invocation_uuid__ "bd64d292338911f19acb7c1e524174c7"
              chromInfo "/tmp/tmpz_szomsr/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              out_opts None
              precision "6"
              singtabop {"__current_case__": 0, "adv": {"header": null, "nrows": null, "skip_blank_lines": true, "skipfooter": null}, "col_row_names": ["has_col_names"], "input": {"values": [{"id": 20, "src": "dce"}]}, "use_type": "single", "user": {"__current_case__": 1, "mode": "select", "select_cols_wanted": "1", "select_keepdupe": null, "select_rows_wanted": null}}
      • Step 10: fasterq-dump (toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/3.1.1+galaxy1):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-2b04072095278721dc9a5772e61e406f399b6030:a95f0e0ff448eede323315668bfa8ee64c918ebb-0

            Command Line:

            • set -o | grep -q pipefail && set -o pipefail;  mkdir -p ~/.ncbi && cp '/tmp/tmpz_szomsr/job_working_directory/000/17/configs/tmpajb5gv5p' ~/.ncbi/user-settings.mkfg &&   export SRA_PREFETCH_RETRIES=3 && export SRA_PREFETCH_ATTEMPT=1 &&    grep '^[[:space:]]*[E|S|D]RR[0-9]\{1,\}[[:space:]]*$' '/tmp/tmpz_szomsr/files/1/0/e/dataset_10e4657b-80dd-4e17-b24d-db685187c653.dat' > accessions && for acc in $(cat ./accessions); do ( echo "Downloading accession: $acc..." &&  while [ $SRA_PREFETCH_ATTEMPT -le $SRA_PREFETCH_RETRIES ] ; do fasterq-dump "$acc" -e ${GALAXY_SLOTS:-1} -t ${TMPDIR} --seq-defline '@$ac.$sn/$ri' --qual-defline '+' --split-3 --skip-technical 2>&1 | tee -a '/tmp/tmpz_szomsr/job_working_directory/000/17/outputs/dataset_45b08ec8-5468-4b6b-88a0-da5186449050.dat'; if [ $? == 0 ] && [ $(ls *.fastq | wc -l) -ge 1 ]; then break ; else echo "Prefetch attempt $SRA_PREFETCH_ATTEMPT of $SRA_PREFETCH_RETRIES exited with code $?" ; SRA_PREFETCH_ATTEMPT=`expr $SRA_PREFETCH_ATTEMPT + 1` ; sleep 1 ; fi ; done && mkdir -p output && mkdir -p outputOther && count="$(ls *.fastq | wc -l)" && echo "There are $count fastq files" && data=($(ls *.fastq)) && if [ "$count" -eq 1 ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${data[0]}" > output/"${acc}"__single.fastqsanger.gz && rm "${data[0]}"; elif [ "--split-3" = "--split-3" ]; then if [ -e "${acc}".fastq ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${acc}".fastq > outputOther/"${acc}"__single.fastqsanger.gz; fi && pigz -cqp ${GALAXY_SLOTS:-1} "${acc}"_1.fastq > output/"${acc}"_forward.fastqsanger.gz && pigz -cqp ${GALAXY_SLOTS:-1} "${acc}"_2.fastq > output/"${acc}"_reverse.fastqsanger.gz && rm "${acc}"*.fastq; elif [ "$count" -eq 2 ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${data[0]}" > output/"${acc}"_forward.fastqsanger.gz && pigz -cqp ${GALAXY_SLOTS:-1} "${data[1]}" > output/"${acc}"_reverse.fastqsanger.gz && rm "${data[0]}" && rm "${data[1]}"; else for file in ${data[*]}; do pigz -cqp ${GALAXY_SLOTS:-1} "$file" > outputOther/"$file"sanger.gz && rm "$file"; done; fi;  ); done; echo "Done with all accessions."

            Exit Code:

            • 0

            Standard Output:

            • Downloading accession: SRR37273407...
              spots read      : 45,419
              reads read      : 45,419
              reads written   : 45,419
              There are 1 fastq files
              Downloading accession: SRR37273408...
              spots read      : 2,935,209
              reads read      : 5,870,418
              reads written   : 5,870,418
              There are 2 fastq files
              Done with all accessions.
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "bd64d292338911f19acb7c1e524174c7"
              adv {"minlen": null, "seq_defline": "@$ac.$sn/$ri", "skip_technical": true, "split": "--split-3"}
              chromInfo "/tmp/tmpz_szomsr/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              input {"__current_case__": 2, "file_list": {"values": [{"id": 21, "src": "dce"}]}, "input_select": "file_list"}
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-2b04072095278721dc9a5772e61e406f399b6030:a95f0e0ff448eede323315668bfa8ee64c918ebb-0

            Command Line:

            • set -o | grep -q pipefail && set -o pipefail;  mkdir -p ~/.ncbi && cp '/tmp/tmpz_szomsr/job_working_directory/000/18/configs/tmpxy4nxptx' ~/.ncbi/user-settings.mkfg &&   export SRA_PREFETCH_RETRIES=3 && export SRA_PREFETCH_ATTEMPT=1 &&    grep '^[[:space:]]*[E|S|D]RR[0-9]\{1,\}[[:space:]]*$' '/tmp/tmpz_szomsr/files/1/4/d/dataset_14daf8c3-ef50-4a7f-b704-3c283cb14b3b.dat' > accessions && for acc in $(cat ./accessions); do ( echo "Downloading accession: $acc..." &&  while [ $SRA_PREFETCH_ATTEMPT -le $SRA_PREFETCH_RETRIES ] ; do fasterq-dump "$acc" -e ${GALAXY_SLOTS:-1} -t ${TMPDIR} --seq-defline '@$ac.$sn/$ri' --qual-defline '+' --split-3 --skip-technical 2>&1 | tee -a '/tmp/tmpz_szomsr/job_working_directory/000/18/outputs/dataset_7f8aa58d-6820-4082-a715-1a1c87b15752.dat'; if [ $? == 0 ] && [ $(ls *.fastq | wc -l) -ge 1 ]; then break ; else echo "Prefetch attempt $SRA_PREFETCH_ATTEMPT of $SRA_PREFETCH_RETRIES exited with code $?" ; SRA_PREFETCH_ATTEMPT=`expr $SRA_PREFETCH_ATTEMPT + 1` ; sleep 1 ; fi ; done && mkdir -p output && mkdir -p outputOther && count="$(ls *.fastq | wc -l)" && echo "There are $count fastq files" && data=($(ls *.fastq)) && if [ "$count" -eq 1 ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${data[0]}" > output/"${acc}"__single.fastqsanger.gz && rm "${data[0]}"; elif [ "--split-3" = "--split-3" ]; then if [ -e "${acc}".fastq ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${acc}".fastq > outputOther/"${acc}"__single.fastqsanger.gz; fi && pigz -cqp ${GALAXY_SLOTS:-1} "${acc}"_1.fastq > output/"${acc}"_forward.fastqsanger.gz && pigz -cqp ${GALAXY_SLOTS:-1} "${acc}"_2.fastq > output/"${acc}"_reverse.fastqsanger.gz && rm "${acc}"*.fastq; elif [ "$count" -eq 2 ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${data[0]}" > output/"${acc}"_forward.fastqsanger.gz && pigz -cqp ${GALAXY_SLOTS:-1} "${data[1]}" > output/"${acc}"_reverse.fastqsanger.gz && rm "${data[0]}" && rm "${data[1]}"; else for file in ${data[*]}; do pigz -cqp ${GALAXY_SLOTS:-1} "$file" > outputOther/"$file"sanger.gz && rm "$file"; done; fi;  ); done; echo "Done with all accessions."

            Exit Code:

            • 0

            Standard Output:

            • Downloading accession: SRR37073390...
              spots read      : 13,266,400
              reads read      : 26,532,800
              reads written   : 26,532,800
              There are 2 fastq files
              Done with all accessions.
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "bd64d292338911f19acb7c1e524174c7"
              adv {"minlen": null, "seq_defline": "@$ac.$sn/$ri", "skip_technical": true, "split": "--split-3"}
              chromInfo "/tmp/tmpz_szomsr/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              input {"__current_case__": 2, "file_list": {"values": [{"id": 22, "src": "dce"}]}, "input_select": "file_list"}
      • Step 11: PE collection regularization (__APPLY_RULES__):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __workflow_invocation_uuid__ "bd64d292338911f19acb7c1e524174c7"
              input {"values": [{"id": 15, "src": "hdca"}]}
              rules {"mapping": [{"collapsible_value": {"__class__": "RuntimeValue"}, "columns": [1], "connectable": true, "editing": false, "is_workflow": false, "type": "list_identifiers"}, {"collapsible_value": {"__class__": "RuntimeValue"}, "columns": [2], "connectable": true, "is_workflow": false, "type": "paired_identifier"}], "rules": [{"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier0", "warn": null}, {"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier1", "warn": null}, {"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier2", "warn": null}]}
      • Step 12: SE collection regularization (__APPLY_RULES__):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __workflow_invocation_uuid__ "bd64d292338911f19acb7c1e524174c7"
              input {"values": [{"id": 16, "src": "hdca"}]}
              rules {"mapping": [{"collapsible_value": {"__class__": "RuntimeValue"}, "columns": [1], "connectable": true, "editing": false, "is_workflow": false, "type": "list_identifiers"}], "rules": [{"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier0", "warn": null}, {"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier1", "warn": null}]}
    • Other invocation details
      • history_id

        • ce57138c0fb39786
      • history_state

        • ok
      • invocation_id

        • ce57138c0fb39786
      • invocation_state

        • scheduled
      • workflow_id

        • ce57138c0fb39786

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 2
Passed 0
Error 0
Failure 2
Skipped 0
Failed Tests
  • ❌ metadata-and-sequences-from-bioproject-ids.ga_0

    Problems:

    • Output collection 'SE output': failed to find identifier 'elements' in the tool generated elements []
      

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: BioProject IDs:

        • step_state: scheduled
      • Step 2: assay (metadata download):

        • step_state: scheduled
      • Step 3: desc (metadata download):

        • step_state: scheduled
      • Step 4: detailed (metadata download):

        • step_state: scheduled
      • Step 5: expand (metadata download):

        • step_state: scheduled
      • Step 6: Separate BioProject IDs (toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/python:3.5--2

            Command Line:

            • mkdir ./out && python '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/2dae863c8f42/split_file_to_collection/split_file_to_collection.py' --out ./out --in '/tmp/tmpp9bzlr6o/files/b/7/c/dataset_b7c4279e-a912-4d27-a28e-d90521fdc0e6.dat' --ftype 'txt' --chunksize 1 --file_names 'split_file' --file_ext 'txt'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "txt"
              __workflow_invocation_uuid__ "2453dbfe33f011f1b6277c1e52dd0599"
              chromInfo "/tmp/tmpp9bzlr6o/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              split_parms {"__current_case__": 5, "input": {"values": [{"id": 1, "src": "hda"}]}, "newfilenames": "split_file", "select_allocate": {"__current_case__": 2, "allocate": "byrow"}, "select_ftype": "txt", "select_mode": {"__current_case__": 0, "chunksize": "1", "mode": "chunk"}}
      • Step 7: Add BioProject IDs as parameters (param_value_from_file):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • cd ../; python _evaluate_expression_.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "2453dbfe33f011f1b6277c1e52dd0599"
              chromInfo "/tmp/tmpp9bzlr6o/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              param_type "text"
              remove_newlines true
      • Step 8: Metadata From BioProject IDs (toolshed.g2.bx.psu.edu/repos/iuc/pysradb_search/pysradb_search/2.5.1+galaxy0):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-e62c45964731bf241efeedb78776ebc093302f62:3c386467fc54c7b7a8da30b0705408fd927d49c0-0

            Command Line:

            • pysradb metadata 'PRJNA1417618' --saveto metadata_output.tsv   --detailed   && pysradb --version

            Exit Code:

            • 0

            Standard Output:

            • pysradb 2.5.1
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "2453dbfe33f011f1b6277c1e52dd0599"
              chromInfo "/tmp/tmpp9bzlr6o/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              conditional_subcommand {"__current_case__": 1, "assay": false, "desc": false, "detailed": true, "expand": false, "prj_id": "PRJNA1417618", "selector": "metadata"}
              dbkey "?"
      • Step 9: Run IDs extract (toolshed.g2.bx.psu.edu/repos/iuc/table_compute/table_compute/1.2.4+galaxy2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-344874846f44224e5f0b7b741eacdddffe895d1e:d3fff24ee1297b4c3bcef48354c2a30f0c82007a-2

            Command Line:

            • cp '/tmp/tmpp9bzlr6o/job_working_directory/000/5/configs/tmps1qnsbmd' ./userconfig.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/safety.py' ./safety.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/table_compute.py' ./table_compute.py && python ./table_compute.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "tsv"
              __workflow_invocation_uuid__ "2453dbfe33f011f1b6277c1e52dd0599"
              chromInfo "/tmp/tmpp9bzlr6o/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              out_opts None
              precision "6"
              singtabop {"__current_case__": 0, "adv": {"header": null, "nrows": null, "skip_blank_lines": true, "skipfooter": null}, "col_row_names": ["has_col_names"], "input": {"values": [{"id": 3, "src": "dce"}]}, "use_type": "single", "user": {"__current_case__": 1, "mode": "select", "select_cols_wanted": "1", "select_keepdupe": null, "select_rows_wanted": null}}
      • Step 10: fasterq-dump (toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/3.1.1+galaxy1):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-2b04072095278721dc9a5772e61e406f399b6030:a95f0e0ff448eede323315668bfa8ee64c918ebb-0

            Command Line:

            • set -o | grep -q pipefail && set -o pipefail;  mkdir -p ~/.ncbi && cp '/tmp/tmpp9bzlr6o/job_working_directory/000/6/configs/tmpitbe8dqy' ~/.ncbi/user-settings.mkfg &&   export SRA_PREFETCH_RETRIES=3 && export SRA_PREFETCH_ATTEMPT=1 &&    grep '^[[:space:]]*[E|S|D]RR[0-9]\{1,\}[[:space:]]*$' '/tmp/tmpp9bzlr6o/files/1/6/b/dataset_16b63cc2-f1ee-4ae2-a107-23d2808d9806.dat' > accessions && for acc in $(cat ./accessions); do ( echo "Downloading accession: $acc..." &&  while [ $SRA_PREFETCH_ATTEMPT -le $SRA_PREFETCH_RETRIES ] ; do fasterq-dump "$acc" -e ${GALAXY_SLOTS:-1} -t ${TMPDIR} --seq-defline '@$ac.$sn/$ri' --qual-defline '+' --split-3 --skip-technical 2>&1 | tee -a '/tmp/tmpp9bzlr6o/job_working_directory/000/6/outputs/dataset_d0db0a2f-b555-4e37-a613-107a708d5d42.dat'; if [ $? == 0 ] && [ $(ls *.fastq | wc -l) -ge 1 ]; then break ; else echo "Prefetch attempt $SRA_PREFETCH_ATTEMPT of $SRA_PREFETCH_RETRIES exited with code $?" ; SRA_PREFETCH_ATTEMPT=`expr $SRA_PREFETCH_ATTEMPT + 1` ; sleep 1 ; fi ; done && mkdir -p output && mkdir -p outputOther && count="$(ls *.fastq | wc -l)" && echo "There are $count fastq files" && data=($(ls *.fastq)) && if [ "$count" -eq 1 ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${data[0]}" > output/"${acc}"__single.fastqsanger.gz && rm "${data[0]}"; elif [ "--split-3" = "--split-3" ]; then if [ -e "${acc}".fastq ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${acc}".fastq > outputOther/"${acc}"__single.fastqsanger.gz; fi && pigz -cqp ${GALAXY_SLOTS:-1} "${acc}"_1.fastq > output/"${acc}"_forward.fastqsanger.gz && pigz -cqp ${GALAXY_SLOTS:-1} "${acc}"_2.fastq > output/"${acc}"_reverse.fastqsanger.gz && rm "${acc}"*.fastq; elif [ "$count" -eq 2 ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${data[0]}" > output/"${acc}"_forward.fastqsanger.gz && pigz -cqp ${GALAXY_SLOTS:-1} "${data[1]}" > output/"${acc}"_reverse.fastqsanger.gz && rm "${data[0]}" && rm "${data[1]}"; else for file in ${data[*]}; do pigz -cqp ${GALAXY_SLOTS:-1} "$file" > outputOther/"$file"sanger.gz && rm "$file"; done; fi;  ); done; echo "Done with all accessions."

            Exit Code:

            • 0

            Standard Output:

            • Downloading accession: SRR37073390...
              spots read      : 13,266,400
              reads read      : 26,532,800
              reads written   : 26,532,800
              There are 2 fastq files
              Done with all accessions.
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "2453dbfe33f011f1b6277c1e52dd0599"
              adv {"minlen": null, "seq_defline": "@$ac.$sn/$ri", "skip_technical": true, "split": "--split-3"}
              chromInfo "/tmp/tmpp9bzlr6o/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              input {"__current_case__": 2, "file_list": {"values": [{"id": 4, "src": "dce"}]}, "input_select": "file_list"}
      • Step 11: PE collection regularization (__APPLY_RULES__):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __workflow_invocation_uuid__ "2453dbfe33f011f1b6277c1e52dd0599"
              input {"values": [{"id": 5, "src": "hdca"}]}
              rules {"mapping": [{"collapsible_value": {"__class__": "RuntimeValue"}, "columns": [1], "connectable": true, "editing": false, "is_workflow": false, "type": "list_identifiers"}, {"collapsible_value": {"__class__": "RuntimeValue"}, "columns": [2], "connectable": true, "is_workflow": false, "type": "paired_identifier"}], "rules": [{"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier0", "warn": null}, {"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier1", "warn": null}, {"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier2", "warn": null}]}
      • Step 12: SE collection regularization (__APPLY_RULES__):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __workflow_invocation_uuid__ "2453dbfe33f011f1b6277c1e52dd0599"
              input {"values": [{"id": 6, "src": "hdca"}]}
              rules {"mapping": [{"collapsible_value": {"__class__": "RuntimeValue"}, "columns": [1], "connectable": true, "editing": false, "is_workflow": false, "type": "list_identifiers"}], "rules": [{"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier0", "warn": null}, {"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier1", "warn": null}]}
    • Other invocation details
      • history_id

        • 012fa4c3251c84b5
      • history_state

        • ok
      • invocation_id

        • 012fa4c3251c84b5
      • invocation_state

        • scheduled
      • workflow_id

        • 77a94017f017a81d
  • ❌ metadata-and-sequences-from-bioproject-ids.ga_1

    Problems:

    • Output with path /tmp/tmplbcb1r0_/SRR37273407__260c9b2a-f2f3-4886-bc47-4776a8977a46.fastqsanger.gz different than expected, difference (using contains):
      ( /home/runner/work/iwc/iwc/workflows/data-fetching/metadata-and-sequences-from-BioProjectIDs/test-data/test2_SRR37273407_forward.fastq v. /tmp/tmpm678dk3jtest2_SRR37273407_forward.fastq )
      Failed to find '@SRR37273407.1 81826be3-9349-4299-8ccd-e1900043df2e/1' in history data. (lines_diff=0).
      

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: BioProject IDs:

        • step_state: scheduled
      • Step 2: assay (metadata download):

        • step_state: scheduled
      • Step 3: desc (metadata download):

        • step_state: scheduled
      • Step 4: detailed (metadata download):

        • step_state: scheduled
      • Step 5: expand (metadata download):

        • step_state: scheduled
      • Step 6: Separate BioProject IDs (toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/python:3.5--2

            Command Line:

            • mkdir ./out && python '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/2dae863c8f42/split_file_to_collection/split_file_to_collection.py' --out ./out --in '/tmp/tmpp9bzlr6o/files/d/1/a/dataset_d1a032f9-a04d-44eb-9f2b-7e6d4f7123c5.dat' --ftype 'txt' --chunksize 1 --file_names 'split_file' --file_ext 'txt'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "txt"
              __workflow_invocation_uuid__ "131fc0e833f311f1b6277c1e52dd0599"
              chromInfo "/tmp/tmpp9bzlr6o/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              split_parms {"__current_case__": 5, "input": {"values": [{"id": 11, "src": "hda"}]}, "newfilenames": "split_file", "select_allocate": {"__current_case__": 2, "allocate": "byrow"}, "select_ftype": "txt", "select_mode": {"__current_case__": 0, "chunksize": "1", "mode": "chunk"}}
      • Step 7: Add BioProject IDs as parameters (param_value_from_file):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • cd ../; python _evaluate_expression_.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "131fc0e833f311f1b6277c1e52dd0599"
              chromInfo "/tmp/tmpp9bzlr6o/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              param_type "text"
              remove_newlines true
          • Job 2:

            • Job state is ok

            Command Line:

            • cd ../; python _evaluate_expression_.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "131fc0e833f311f1b6277c1e52dd0599"
              chromInfo "/tmp/tmpp9bzlr6o/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              param_type "text"
              remove_newlines true
      • Step 8: Metadata From BioProject IDs (toolshed.g2.bx.psu.edu/repos/iuc/pysradb_search/pysradb_search/2.5.1+galaxy0):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-e62c45964731bf241efeedb78776ebc093302f62:3c386467fc54c7b7a8da30b0705408fd927d49c0-0

            Command Line:

            • pysradb metadata 'PRJNA1425250' --saveto metadata_output.tsv   --detailed   && pysradb --version

            Exit Code:

            • 0

            Standard Output:

            • pysradb 2.5.1
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "131fc0e833f311f1b6277c1e52dd0599"
              chromInfo "/tmp/tmpp9bzlr6o/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              conditional_subcommand {"__current_case__": 1, "assay": false, "desc": false, "detailed": true, "expand": false, "prj_id": "PRJNA1425250", "selector": "metadata"}
              dbkey "?"
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-e62c45964731bf241efeedb78776ebc093302f62:3c386467fc54c7b7a8da30b0705408fd927d49c0-0

            Command Line:

            • pysradb metadata 'PRJNA1417618' --saveto metadata_output.tsv   --detailed   && pysradb --version

            Exit Code:

            • 0

            Standard Output:

            • pysradb 2.5.1
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "131fc0e833f311f1b6277c1e52dd0599"
              chromInfo "/tmp/tmpp9bzlr6o/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              conditional_subcommand {"__current_case__": 1, "assay": false, "desc": false, "detailed": true, "expand": false, "prj_id": "PRJNA1417618", "selector": "metadata"}
              dbkey "?"
      • Step 9: Run IDs extract (toolshed.g2.bx.psu.edu/repos/iuc/table_compute/table_compute/1.2.4+galaxy2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-344874846f44224e5f0b7b741eacdddffe895d1e:d3fff24ee1297b4c3bcef48354c2a30f0c82007a-2

            Command Line:

            • cp '/tmp/tmpp9bzlr6o/job_working_directory/000/15/configs/tmp4r_at51r' ./userconfig.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/safety.py' ./safety.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/table_compute.py' ./table_compute.py && python ./table_compute.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "tsv"
              __workflow_invocation_uuid__ "131fc0e833f311f1b6277c1e52dd0599"
              chromInfo "/tmp/tmpp9bzlr6o/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              out_opts None
              precision "6"
              singtabop {"__current_case__": 0, "adv": {"header": null, "nrows": null, "skip_blank_lines": true, "skipfooter": null}, "col_row_names": ["has_col_names"], "input": {"values": [{"id": 19, "src": "dce"}]}, "use_type": "single", "user": {"__current_case__": 1, "mode": "select", "select_cols_wanted": "1", "select_keepdupe": null, "select_rows_wanted": null}}
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-344874846f44224e5f0b7b741eacdddffe895d1e:d3fff24ee1297b4c3bcef48354c2a30f0c82007a-2

            Command Line:

            • cp '/tmp/tmpp9bzlr6o/job_working_directory/000/16/configs/tmpmirdlkji' ./userconfig.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/safety.py' ./safety.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/table_compute.py' ./table_compute.py && python ./table_compute.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "tsv"
              __workflow_invocation_uuid__ "131fc0e833f311f1b6277c1e52dd0599"
              chromInfo "/tmp/tmpp9bzlr6o/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              out_opts None
              precision "6"
              singtabop {"__current_case__": 0, "adv": {"header": null, "nrows": null, "skip_blank_lines": true, "skipfooter": null}, "col_row_names": ["has_col_names"], "input": {"values": [{"id": 20, "src": "dce"}]}, "use_type": "single", "user": {"__current_case__": 1, "mode": "select", "select_cols_wanted": "1", "select_keepdupe": null, "select_rows_wanted": null}}
      • Step 10: fasterq-dump (toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/3.1.1+galaxy1):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-2b04072095278721dc9a5772e61e406f399b6030:a95f0e0ff448eede323315668bfa8ee64c918ebb-0

            Command Line:

            • set -o | grep -q pipefail && set -o pipefail;  mkdir -p ~/.ncbi && cp '/tmp/tmpp9bzlr6o/job_working_directory/000/17/configs/tmpl93wlxn3' ~/.ncbi/user-settings.mkfg &&   export SRA_PREFETCH_RETRIES=3 && export SRA_PREFETCH_ATTEMPT=1 &&    grep '^[[:space:]]*[E|S|D]RR[0-9]\{1,\}[[:space:]]*$' '/tmp/tmpp9bzlr6o/files/f/3/1/dataset_f3186fe3-7d43-4b89-ac99-42003aa06e60.dat' > accessions && for acc in $(cat ./accessions); do ( echo "Downloading accession: $acc..." &&  while [ $SRA_PREFETCH_ATTEMPT -le $SRA_PREFETCH_RETRIES ] ; do fasterq-dump "$acc" -e ${GALAXY_SLOTS:-1} -t ${TMPDIR} --seq-defline '@$ac.$sn/$ri' --qual-defline '+' --split-3 --skip-technical 2>&1 | tee -a '/tmp/tmpp9bzlr6o/job_working_directory/000/17/outputs/dataset_208c73ac-f0ff-4ec6-9607-c662a8ccff79.dat'; if [ $? == 0 ] && [ $(ls *.fastq | wc -l) -ge 1 ]; then break ; else echo "Prefetch attempt $SRA_PREFETCH_ATTEMPT of $SRA_PREFETCH_RETRIES exited with code $?" ; SRA_PREFETCH_ATTEMPT=`expr $SRA_PREFETCH_ATTEMPT + 1` ; sleep 1 ; fi ; done && mkdir -p output && mkdir -p outputOther && count="$(ls *.fastq | wc -l)" && echo "There are $count fastq files" && data=($(ls *.fastq)) && if [ "$count" -eq 1 ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${data[0]}" > output/"${acc}"__single.fastqsanger.gz && rm "${data[0]}"; elif [ "--split-3" = "--split-3" ]; then if [ -e "${acc}".fastq ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${acc}".fastq > outputOther/"${acc}"__single.fastqsanger.gz; fi && pigz -cqp ${GALAXY_SLOTS:-1} "${acc}"_1.fastq > output/"${acc}"_forward.fastqsanger.gz && pigz -cqp ${GALAXY_SLOTS:-1} "${acc}"_2.fastq > output/"${acc}"_reverse.fastqsanger.gz && rm "${acc}"*.fastq; elif [ "$count" -eq 2 ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${data[0]}" > output/"${acc}"_forward.fastqsanger.gz && pigz -cqp ${GALAXY_SLOTS:-1} "${data[1]}" > output/"${acc}"_reverse.fastqsanger.gz && rm "${data[0]}" && rm "${data[1]}"; else for file in ${data[*]}; do pigz -cqp ${GALAXY_SLOTS:-1} "$file" > outputOther/"$file"sanger.gz && rm "$file"; done; fi;  ); done; echo "Done with all accessions."

            Exit Code:

            • 0

            Standard Output:

            • Downloading accession: SRR37273407...
              spots read      : 45,419
              reads read      : 45,419
              reads written   : 45,419
              There are 1 fastq files
              Downloading accession: SRR37273408...
              spots read      : 2,935,209
              reads read      : 5,870,418
              reads written   : 5,870,418
              There are 2 fastq files
              Done with all accessions.
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "131fc0e833f311f1b6277c1e52dd0599"
              adv {"minlen": null, "seq_defline": "@$ac.$sn/$ri", "skip_technical": true, "split": "--split-3"}
              chromInfo "/tmp/tmpp9bzlr6o/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              input {"__current_case__": 2, "file_list": {"values": [{"id": 21, "src": "dce"}]}, "input_select": "file_list"}
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-2b04072095278721dc9a5772e61e406f399b6030:a95f0e0ff448eede323315668bfa8ee64c918ebb-0

            Command Line:

            • set -o | grep -q pipefail && set -o pipefail;  mkdir -p ~/.ncbi && cp '/tmp/tmpp9bzlr6o/job_working_directory/000/18/configs/tmpfp5je7d0' ~/.ncbi/user-settings.mkfg &&   export SRA_PREFETCH_RETRIES=3 && export SRA_PREFETCH_ATTEMPT=1 &&    grep '^[[:space:]]*[E|S|D]RR[0-9]\{1,\}[[:space:]]*$' '/tmp/tmpp9bzlr6o/files/e/6/a/dataset_e6af459c-742f-4755-8ab7-b0364b8bc775.dat' > accessions && for acc in $(cat ./accessions); do ( echo "Downloading accession: $acc..." &&  while [ $SRA_PREFETCH_ATTEMPT -le $SRA_PREFETCH_RETRIES ] ; do fasterq-dump "$acc" -e ${GALAXY_SLOTS:-1} -t ${TMPDIR} --seq-defline '@$ac.$sn/$ri' --qual-defline '+' --split-3 --skip-technical 2>&1 | tee -a '/tmp/tmpp9bzlr6o/job_working_directory/000/18/outputs/dataset_fdd9e3f8-3b05-4c08-be9e-8b81809ddc7e.dat'; if [ $? == 0 ] && [ $(ls *.fastq | wc -l) -ge 1 ]; then break ; else echo "Prefetch attempt $SRA_PREFETCH_ATTEMPT of $SRA_PREFETCH_RETRIES exited with code $?" ; SRA_PREFETCH_ATTEMPT=`expr $SRA_PREFETCH_ATTEMPT + 1` ; sleep 1 ; fi ; done && mkdir -p output && mkdir -p outputOther && count="$(ls *.fastq | wc -l)" && echo "There are $count fastq files" && data=($(ls *.fastq)) && if [ "$count" -eq 1 ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${data[0]}" > output/"${acc}"__single.fastqsanger.gz && rm "${data[0]}"; elif [ "--split-3" = "--split-3" ]; then if [ -e "${acc}".fastq ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${acc}".fastq > outputOther/"${acc}"__single.fastqsanger.gz; fi && pigz -cqp ${GALAXY_SLOTS:-1} "${acc}"_1.fastq > output/"${acc}"_forward.fastqsanger.gz && pigz -cqp ${GALAXY_SLOTS:-1} "${acc}"_2.fastq > output/"${acc}"_reverse.fastqsanger.gz && rm "${acc}"*.fastq; elif [ "$count" -eq 2 ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${data[0]}" > output/"${acc}"_forward.fastqsanger.gz && pigz -cqp ${GALAXY_SLOTS:-1} "${data[1]}" > output/"${acc}"_reverse.fastqsanger.gz && rm "${data[0]}" && rm "${data[1]}"; else for file in ${data[*]}; do pigz -cqp ${GALAXY_SLOTS:-1} "$file" > outputOther/"$file"sanger.gz && rm "$file"; done; fi;  ); done; echo "Done with all accessions."

            Exit Code:

            • 0

            Standard Output:

            • Downloading accession: SRR37073390...
              spots read      : 13,266,400
              reads read      : 26,532,800
              reads written   : 26,532,800
              There are 2 fastq files
              Done with all accessions.
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "131fc0e833f311f1b6277c1e52dd0599"
              adv {"minlen": null, "seq_defline": "@$ac.$sn/$ri", "skip_technical": true, "split": "--split-3"}
              chromInfo "/tmp/tmpp9bzlr6o/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              input {"__current_case__": 2, "file_list": {"values": [{"id": 22, "src": "dce"}]}, "input_select": "file_list"}
      • Step 11: PE collection regularization (__APPLY_RULES__):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __workflow_invocation_uuid__ "131fc0e833f311f1b6277c1e52dd0599"
              input {"values": [{"id": 15, "src": "hdca"}]}
              rules {"mapping": [{"collapsible_value": {"__class__": "RuntimeValue"}, "columns": [1], "connectable": true, "editing": false, "is_workflow": false, "type": "list_identifiers"}, {"collapsible_value": {"__class__": "RuntimeValue"}, "columns": [2], "connectable": true, "is_workflow": false, "type": "paired_identifier"}], "rules": [{"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier0", "warn": null}, {"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier1", "warn": null}, {"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier2", "warn": null}]}
      • Step 12: SE collection regularization (__APPLY_RULES__):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __workflow_invocation_uuid__ "131fc0e833f311f1b6277c1e52dd0599"
              input {"values": [{"id": 16, "src": "hdca"}]}
              rules {"mapping": [{"collapsible_value": {"__class__": "RuntimeValue"}, "columns": [1], "connectable": true, "editing": false, "is_workflow": false, "type": "list_identifiers"}], "rules": [{"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier0", "warn": null}, {"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier1", "warn": null}]}
    • Other invocation details
      • history_id

        • 77a94017f017a81d
      • history_state

        • ok
      • invocation_id

        • 77a94017f017a81d
      • invocation_state

        • scheduled
      • workflow_id

        • 77a94017f017a81d

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 2
Passed 1
Error 0
Failure 1
Skipped 0
Failed Tests
  • ❌ metadata-and-sequences-from-bioproject-ids.ga_1

    Problems:

    • Output with path /tmp/tmpiyx5crd0/SRR37273407__07f40436-1128-4f3e-96cc-c31e640257d1.fastqsanger.gz different than expected, difference (using contains):
      ( /home/runner/work/iwc/iwc/workflows/data-fetching/metadata-and-sequences-from-BioProjectIDs/test-data/test2_SRR37273407_forward.fastq v. /tmp/tmpslh8o9r8test2_SRR37273407_forward.fastq )
      Failed to find '@SRR37273407.1 81826be3-9349-4299-8ccd-e1900043df2e/1' in history data. (lines_diff=0).
      

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: BioProject IDs:

        • step_state: scheduled
      • Step 2: assay (metadata download):

        • step_state: scheduled
      • Step 3: desc (metadata download):

        • step_state: scheduled
      • Step 4: detailed (metadata download):

        • step_state: scheduled
      • Step 5: expand (metadata download):

        • step_state: scheduled
      • Step 6: Separate BioProject IDs (toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/python:3.5--2

            Command Line:

            • mkdir ./out && python '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/2dae863c8f42/split_file_to_collection/split_file_to_collection.py' --out ./out --in '/tmp/tmprggu1by5/files/c/7/2/dataset_c72088f6-4594-49e9-8503-8c2cbf9dde4d.dat' --ftype 'txt' --chunksize 1 --file_names 'split_file' --file_ext 'txt'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "txt"
              __workflow_invocation_uuid__ "9c75356c343911f19acb7c1e5239ee4f"
              chromInfo "/tmp/tmprggu1by5/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              split_parms {"__current_case__": 5, "input": {"values": [{"id": 11, "src": "hda"}]}, "newfilenames": "split_file", "select_allocate": {"__current_case__": 2, "allocate": "byrow"}, "select_ftype": "txt", "select_mode": {"__current_case__": 0, "chunksize": "1", "mode": "chunk"}}
      • Step 7: Add BioProject IDs as parameters (param_value_from_file):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • cd ../; python _evaluate_expression_.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "9c75356c343911f19acb7c1e5239ee4f"
              chromInfo "/tmp/tmprggu1by5/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              param_type "text"
              remove_newlines true
          • Job 2:

            • Job state is ok

            Command Line:

            • cd ../; python _evaluate_expression_.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "9c75356c343911f19acb7c1e5239ee4f"
              chromInfo "/tmp/tmprggu1by5/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              param_type "text"
              remove_newlines true
      • Step 8: Metadata From BioProject IDs (toolshed.g2.bx.psu.edu/repos/iuc/pysradb_search/pysradb_search/2.5.1+galaxy0):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-e62c45964731bf241efeedb78776ebc093302f62:3c386467fc54c7b7a8da30b0705408fd927d49c0-0

            Command Line:

            • pysradb metadata 'PRJNA1425250' --saveto metadata_output.tsv   --detailed   && pysradb --version

            Exit Code:

            • 0

            Standard Output:

            • pysradb 2.5.1
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "9c75356c343911f19acb7c1e5239ee4f"
              chromInfo "/tmp/tmprggu1by5/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              conditional_subcommand {"__current_case__": 1, "assay": false, "desc": false, "detailed": true, "expand": false, "prj_id": "PRJNA1425250", "selector": "metadata"}
              dbkey "?"
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-e62c45964731bf241efeedb78776ebc093302f62:3c386467fc54c7b7a8da30b0705408fd927d49c0-0

            Command Line:

            • pysradb metadata 'PRJNA1417618' --saveto metadata_output.tsv   --detailed   && pysradb --version

            Exit Code:

            • 0

            Standard Output:

            • pysradb 2.5.1
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "9c75356c343911f19acb7c1e5239ee4f"
              chromInfo "/tmp/tmprggu1by5/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              conditional_subcommand {"__current_case__": 1, "assay": false, "desc": false, "detailed": true, "expand": false, "prj_id": "PRJNA1417618", "selector": "metadata"}
              dbkey "?"
      • Step 9: Run IDs extract (toolshed.g2.bx.psu.edu/repos/iuc/table_compute/table_compute/1.2.4+galaxy2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-344874846f44224e5f0b7b741eacdddffe895d1e:d3fff24ee1297b4c3bcef48354c2a30f0c82007a-2

            Command Line:

            • cp '/tmp/tmprggu1by5/job_working_directory/000/15/configs/tmp3fq4fogu' ./userconfig.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/safety.py' ./safety.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/table_compute.py' ./table_compute.py && python ./table_compute.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "tsv"
              __workflow_invocation_uuid__ "9c75356c343911f19acb7c1e5239ee4f"
              chromInfo "/tmp/tmprggu1by5/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              out_opts None
              precision "6"
              singtabop {"__current_case__": 0, "adv": {"header": null, "nrows": null, "skip_blank_lines": true, "skipfooter": null}, "col_row_names": ["has_col_names"], "input": {"values": [{"id": 19, "src": "dce"}]}, "use_type": "single", "user": {"__current_case__": 1, "mode": "select", "select_cols_wanted": "1", "select_keepdupe": null, "select_rows_wanted": null}}
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-344874846f44224e5f0b7b741eacdddffe895d1e:d3fff24ee1297b4c3bcef48354c2a30f0c82007a-2

            Command Line:

            • cp '/tmp/tmprggu1by5/job_working_directory/000/16/configs/tmpl7xngk0_' ./userconfig.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/safety.py' ./safety.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/table_compute.py' ./table_compute.py && python ./table_compute.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "tsv"
              __workflow_invocation_uuid__ "9c75356c343911f19acb7c1e5239ee4f"
              chromInfo "/tmp/tmprggu1by5/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              out_opts None
              precision "6"
              singtabop {"__current_case__": 0, "adv": {"header": null, "nrows": null, "skip_blank_lines": true, "skipfooter": null}, "col_row_names": ["has_col_names"], "input": {"values": [{"id": 20, "src": "dce"}]}, "use_type": "single", "user": {"__current_case__": 1, "mode": "select", "select_cols_wanted": "1", "select_keepdupe": null, "select_rows_wanted": null}}
      • Step 10: fasterq-dump (toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/3.1.1+galaxy1):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-2b04072095278721dc9a5772e61e406f399b6030:a95f0e0ff448eede323315668bfa8ee64c918ebb-0

            Command Line:

            • set -o | grep -q pipefail && set -o pipefail;  mkdir -p ~/.ncbi && cp '/tmp/tmprggu1by5/job_working_directory/000/17/configs/tmpivqxcgl9' ~/.ncbi/user-settings.mkfg &&   export SRA_PREFETCH_RETRIES=3 && export SRA_PREFETCH_ATTEMPT=1 &&    grep '^[[:space:]]*[E|S|D]RR[0-9]\{1,\}[[:space:]]*$' '/tmp/tmprggu1by5/files/c/c/3/dataset_cc32357c-eff0-499f-9d16-850a744e46c2.dat' > accessions && for acc in $(cat ./accessions); do ( echo "Downloading accession: $acc..." &&  while [ $SRA_PREFETCH_ATTEMPT -le $SRA_PREFETCH_RETRIES ] ; do fasterq-dump "$acc" -e ${GALAXY_SLOTS:-1} -t ${TMPDIR} --seq-defline '@$ac.$sn/$ri' --qual-defline '+' --split-3 --skip-technical 2>&1 | tee -a '/tmp/tmprggu1by5/job_working_directory/000/17/outputs/dataset_954b7893-6cde-4954-b302-dc6a15e3c621.dat'; if [ $? == 0 ] && [ $(ls *.fastq | wc -l) -ge 1 ]; then break ; else echo "Prefetch attempt $SRA_PREFETCH_ATTEMPT of $SRA_PREFETCH_RETRIES exited with code $?" ; SRA_PREFETCH_ATTEMPT=`expr $SRA_PREFETCH_ATTEMPT + 1` ; sleep 1 ; fi ; done && mkdir -p output && mkdir -p outputOther && count="$(ls *.fastq | wc -l)" && echo "There are $count fastq files" && data=($(ls *.fastq)) && if [ "$count" -eq 1 ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${data[0]}" > output/"${acc}"__single.fastqsanger.gz && rm "${data[0]}"; elif [ "--split-3" = "--split-3" ]; then if [ -e "${acc}".fastq ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${acc}".fastq > outputOther/"${acc}"__single.fastqsanger.gz; fi && pigz -cqp ${GALAXY_SLOTS:-1} "${acc}"_1.fastq > output/"${acc}"_forward.fastqsanger.gz && pigz -cqp ${GALAXY_SLOTS:-1} "${acc}"_2.fastq > output/"${acc}"_reverse.fastqsanger.gz && rm "${acc}"*.fastq; elif [ "$count" -eq 2 ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${data[0]}" > output/"${acc}"_forward.fastqsanger.gz && pigz -cqp ${GALAXY_SLOTS:-1} "${data[1]}" > output/"${acc}"_reverse.fastqsanger.gz && rm "${data[0]}" && rm "${data[1]}"; else for file in ${data[*]}; do pigz -cqp ${GALAXY_SLOTS:-1} "$file" > outputOther/"$file"sanger.gz && rm "$file"; done; fi;  ); done; echo "Done with all accessions."

            Exit Code:

            • 0

            Standard Output:

            • Downloading accession: SRR37273407...
              spots read      : 45,419
              reads read      : 45,419
              reads written   : 45,419
              There are 1 fastq files
              Downloading accession: SRR37273408...
              spots read      : 2,935,209
              reads read      : 5,870,418
              reads written   : 5,870,418
              There are 2 fastq files
              Done with all accessions.
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "9c75356c343911f19acb7c1e5239ee4f"
              adv {"minlen": null, "seq_defline": "@$ac.$sn/$ri", "skip_technical": true, "split": "--split-3"}
              chromInfo "/tmp/tmprggu1by5/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              input {"__current_case__": 2, "file_list": {"values": [{"id": 21, "src": "dce"}]}, "input_select": "file_list"}
          • Job 2:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-2b04072095278721dc9a5772e61e406f399b6030:a95f0e0ff448eede323315668bfa8ee64c918ebb-0

            Command Line:

            • set -o | grep -q pipefail && set -o pipefail;  mkdir -p ~/.ncbi && cp '/tmp/tmprggu1by5/job_working_directory/000/18/configs/tmpvvedcy4g' ~/.ncbi/user-settings.mkfg &&   export SRA_PREFETCH_RETRIES=3 && export SRA_PREFETCH_ATTEMPT=1 &&    grep '^[[:space:]]*[E|S|D]RR[0-9]\{1,\}[[:space:]]*$' '/tmp/tmprggu1by5/files/2/3/7/dataset_237d78a3-05c3-4f19-b85c-9858f06f9929.dat' > accessions && for acc in $(cat ./accessions); do ( echo "Downloading accession: $acc..." &&  while [ $SRA_PREFETCH_ATTEMPT -le $SRA_PREFETCH_RETRIES ] ; do fasterq-dump "$acc" -e ${GALAXY_SLOTS:-1} -t ${TMPDIR} --seq-defline '@$ac.$sn/$ri' --qual-defline '+' --split-3 --skip-technical 2>&1 | tee -a '/tmp/tmprggu1by5/job_working_directory/000/18/outputs/dataset_38993ca6-5d33-4ee9-aead-089e45729e60.dat'; if [ $? == 0 ] && [ $(ls *.fastq | wc -l) -ge 1 ]; then break ; else echo "Prefetch attempt $SRA_PREFETCH_ATTEMPT of $SRA_PREFETCH_RETRIES exited with code $?" ; SRA_PREFETCH_ATTEMPT=`expr $SRA_PREFETCH_ATTEMPT + 1` ; sleep 1 ; fi ; done && mkdir -p output && mkdir -p outputOther && count="$(ls *.fastq | wc -l)" && echo "There are $count fastq files" && data=($(ls *.fastq)) && if [ "$count" -eq 1 ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${data[0]}" > output/"${acc}"__single.fastqsanger.gz && rm "${data[0]}"; elif [ "--split-3" = "--split-3" ]; then if [ -e "${acc}".fastq ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${acc}".fastq > outputOther/"${acc}"__single.fastqsanger.gz; fi && pigz -cqp ${GALAXY_SLOTS:-1} "${acc}"_1.fastq > output/"${acc}"_forward.fastqsanger.gz && pigz -cqp ${GALAXY_SLOTS:-1} "${acc}"_2.fastq > output/"${acc}"_reverse.fastqsanger.gz && rm "${acc}"*.fastq; elif [ "$count" -eq 2 ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${data[0]}" > output/"${acc}"_forward.fastqsanger.gz && pigz -cqp ${GALAXY_SLOTS:-1} "${data[1]}" > output/"${acc}"_reverse.fastqsanger.gz && rm "${data[0]}" && rm "${data[1]}"; else for file in ${data[*]}; do pigz -cqp ${GALAXY_SLOTS:-1} "$file" > outputOther/"$file"sanger.gz && rm "$file"; done; fi;  ); done; echo "Done with all accessions."

            Exit Code:

            • 0

            Standard Output:

            • Downloading accession: SRR37073390...
              spots read      : 13,266,400
              reads read      : 26,532,800
              reads written   : 26,532,800
              There are 2 fastq files
              Done with all accessions.
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "9c75356c343911f19acb7c1e5239ee4f"
              adv {"minlen": null, "seq_defline": "@$ac.$sn/$ri", "skip_technical": true, "split": "--split-3"}
              chromInfo "/tmp/tmprggu1by5/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              input {"__current_case__": 2, "file_list": {"values": [{"id": 22, "src": "dce"}]}, "input_select": "file_list"}
      • Step 11: PE collection regularization (__APPLY_RULES__):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __workflow_invocation_uuid__ "9c75356c343911f19acb7c1e5239ee4f"
              input {"values": [{"id": 15, "src": "hdca"}]}
              rules {"mapping": [{"collapsible_value": {"__class__": "RuntimeValue"}, "columns": [1], "connectable": true, "editing": false, "is_workflow": false, "type": "list_identifiers"}, {"collapsible_value": {"__class__": "RuntimeValue"}, "columns": [2], "connectable": true, "is_workflow": false, "type": "paired_identifier"}], "rules": [{"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier0", "warn": null}, {"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier1", "warn": null}, {"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier2", "warn": null}]}
      • Step 12: SE collection regularization (__APPLY_RULES__):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __workflow_invocation_uuid__ "9c75356c343911f19acb7c1e5239ee4f"
              input {"values": [{"id": 16, "src": "hdca"}]}
              rules {"mapping": [{"collapsible_value": {"__class__": "RuntimeValue"}, "columns": [1], "connectable": true, "editing": false, "is_workflow": false, "type": "list_identifiers"}], "rules": [{"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier0", "warn": null}, {"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier1", "warn": null}]}
    • Other invocation details
      • history_id

        • 44565cbdc794da37
      • history_state

        • ok
      • invocation_id

        • 44565cbdc794da37
      • invocation_state

        • scheduled
      • workflow_id

        • 44565cbdc794da37
Passed Tests
  • ✅ metadata-and-sequences-from-bioproject-ids.ga_0

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: BioProject IDs:

        • step_state: scheduled
      • Step 2: assay (metadata download):

        • step_state: scheduled
      • Step 3: desc (metadata download):

        • step_state: scheduled
      • Step 4: detailed (metadata download):

        • step_state: scheduled
      • Step 5: expand (metadata download):

        • step_state: scheduled
      • Step 6: Separate BioProject IDs (toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/python:3.5--2

            Command Line:

            • mkdir ./out && python '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/2dae863c8f42/split_file_to_collection/split_file_to_collection.py' --out ./out --in '/tmp/tmprggu1by5/files/3/0/2/dataset_302fa693-3d4a-464c-a42f-c4bd80056417.dat' --ftype 'txt' --chunksize 1 --file_names 'split_file' --file_ext 'txt'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "txt"
              __workflow_invocation_uuid__ "8459a7ae343611f19acb7c1e5239ee4f"
              chromInfo "/tmp/tmprggu1by5/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              split_parms {"__current_case__": 5, "input": {"values": [{"id": 1, "src": "hda"}]}, "newfilenames": "split_file", "select_allocate": {"__current_case__": 2, "allocate": "byrow"}, "select_ftype": "txt", "select_mode": {"__current_case__": 0, "chunksize": "1", "mode": "chunk"}}
      • Step 7: Add BioProject IDs as parameters (param_value_from_file):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • cd ../; python _evaluate_expression_.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "8459a7ae343611f19acb7c1e5239ee4f"
              chromInfo "/tmp/tmprggu1by5/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              param_type "text"
              remove_newlines true
      • Step 8: Metadata From BioProject IDs (toolshed.g2.bx.psu.edu/repos/iuc/pysradb_search/pysradb_search/2.5.1+galaxy0):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-e62c45964731bf241efeedb78776ebc093302f62:3c386467fc54c7b7a8da30b0705408fd927d49c0-0

            Command Line:

            • pysradb metadata 'PRJNA1417618' --saveto metadata_output.tsv   --detailed   && pysradb --version

            Exit Code:

            • 0

            Standard Output:

            • pysradb 2.5.1
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "8459a7ae343611f19acb7c1e5239ee4f"
              chromInfo "/tmp/tmprggu1by5/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              conditional_subcommand {"__current_case__": 1, "assay": false, "desc": false, "detailed": true, "expand": false, "prj_id": "PRJNA1417618", "selector": "metadata"}
              dbkey "?"
      • Step 9: Run IDs extract (toolshed.g2.bx.psu.edu/repos/iuc/table_compute/table_compute/1.2.4+galaxy2):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-344874846f44224e5f0b7b741eacdddffe895d1e:d3fff24ee1297b4c3bcef48354c2a30f0c82007a-2

            Command Line:

            • cp '/tmp/tmprggu1by5/job_working_directory/000/5/configs/tmpv87ol3h0' ./userconfig.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/safety.py' ./safety.py && cp '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/table_compute/cd36d6e45e29/table_compute/scripts/table_compute.py' ./table_compute.py && python ./table_compute.py

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "tsv"
              __workflow_invocation_uuid__ "8459a7ae343611f19acb7c1e5239ee4f"
              chromInfo "/tmp/tmprggu1by5/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              out_opts None
              precision "6"
              singtabop {"__current_case__": 0, "adv": {"header": null, "nrows": null, "skip_blank_lines": true, "skipfooter": null}, "col_row_names": ["has_col_names"], "input": {"values": [{"id": 3, "src": "dce"}]}, "use_type": "single", "user": {"__current_case__": 1, "mode": "select", "select_cols_wanted": "1", "select_keepdupe": null, "select_rows_wanted": null}}
      • Step 10: fasterq-dump (toolshed.g2.bx.psu.edu/repos/iuc/sra_tools/fasterq_dump/3.1.1+galaxy1):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Container:

            • quay.io/biocontainers/mulled-v2-2b04072095278721dc9a5772e61e406f399b6030:a95f0e0ff448eede323315668bfa8ee64c918ebb-0

            Command Line:

            • set -o | grep -q pipefail && set -o pipefail;  mkdir -p ~/.ncbi && cp '/tmp/tmprggu1by5/job_working_directory/000/6/configs/tmpw8qwyi1e' ~/.ncbi/user-settings.mkfg &&   export SRA_PREFETCH_RETRIES=3 && export SRA_PREFETCH_ATTEMPT=1 &&    grep '^[[:space:]]*[E|S|D]RR[0-9]\{1,\}[[:space:]]*$' '/tmp/tmprggu1by5/files/a/6/c/dataset_a6c5b46a-3848-4eb5-b8a7-499e424a4b18.dat' > accessions && for acc in $(cat ./accessions); do ( echo "Downloading accession: $acc..." &&  while [ $SRA_PREFETCH_ATTEMPT -le $SRA_PREFETCH_RETRIES ] ; do fasterq-dump "$acc" -e ${GALAXY_SLOTS:-1} -t ${TMPDIR} --seq-defline '@$ac.$sn/$ri' --qual-defline '+' --split-3 --skip-technical 2>&1 | tee -a '/tmp/tmprggu1by5/job_working_directory/000/6/outputs/dataset_e1815531-30b4-4565-91b8-33fe0d624c8f.dat'; if [ $? == 0 ] && [ $(ls *.fastq | wc -l) -ge 1 ]; then break ; else echo "Prefetch attempt $SRA_PREFETCH_ATTEMPT of $SRA_PREFETCH_RETRIES exited with code $?" ; SRA_PREFETCH_ATTEMPT=`expr $SRA_PREFETCH_ATTEMPT + 1` ; sleep 1 ; fi ; done && mkdir -p output && mkdir -p outputOther && count="$(ls *.fastq | wc -l)" && echo "There are $count fastq files" && data=($(ls *.fastq)) && if [ "$count" -eq 1 ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${data[0]}" > output/"${acc}"__single.fastqsanger.gz && rm "${data[0]}"; elif [ "--split-3" = "--split-3" ]; then if [ -e "${acc}".fastq ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${acc}".fastq > outputOther/"${acc}"__single.fastqsanger.gz; fi && pigz -cqp ${GALAXY_SLOTS:-1} "${acc}"_1.fastq > output/"${acc}"_forward.fastqsanger.gz && pigz -cqp ${GALAXY_SLOTS:-1} "${acc}"_2.fastq > output/"${acc}"_reverse.fastqsanger.gz && rm "${acc}"*.fastq; elif [ "$count" -eq 2 ]; then pigz -cqp ${GALAXY_SLOTS:-1} "${data[0]}" > output/"${acc}"_forward.fastqsanger.gz && pigz -cqp ${GALAXY_SLOTS:-1} "${data[1]}" > output/"${acc}"_reverse.fastqsanger.gz && rm "${data[0]}" && rm "${data[1]}"; else for file in ${data[*]}; do pigz -cqp ${GALAXY_SLOTS:-1} "$file" > outputOther/"$file"sanger.gz && rm "$file"; done; fi;  ); done; echo "Done with all accessions."

            Exit Code:

            • 0

            Standard Output:

            • Downloading accession: SRR37073390...
              spots read      : 13,266,400
              reads read      : 26,532,800
              reads written   : 26,532,800
              There are 2 fastq files
              Done with all accessions.
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "8459a7ae343611f19acb7c1e5239ee4f"
              adv {"minlen": null, "seq_defline": "@$ac.$sn/$ri", "skip_technical": true, "split": "--split-3"}
              chromInfo "/tmp/tmprggu1by5/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              input {"__current_case__": 2, "file_list": {"values": [{"id": 4, "src": "dce"}]}, "input_select": "file_list"}
      • Step 11: PE collection regularization (__APPLY_RULES__):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __workflow_invocation_uuid__ "8459a7ae343611f19acb7c1e5239ee4f"
              input {"values": [{"id": 5, "src": "hdca"}]}
              rules {"mapping": [{"collapsible_value": {"__class__": "RuntimeValue"}, "columns": [1], "connectable": true, "editing": false, "is_workflow": false, "type": "list_identifiers"}, {"collapsible_value": {"__class__": "RuntimeValue"}, "columns": [2], "connectable": true, "is_workflow": false, "type": "paired_identifier"}], "rules": [{"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier0", "warn": null}, {"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier1", "warn": null}, {"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier2", "warn": null}]}
      • Step 12: SE collection regularization (__APPLY_RULES__):

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __workflow_invocation_uuid__ "8459a7ae343611f19acb7c1e5239ee4f"
              input {"values": [{"id": 6, "src": "hdca"}]}
              rules {"mapping": [{"collapsible_value": {"__class__": "RuntimeValue"}, "columns": [1], "connectable": true, "editing": false, "is_workflow": false, "type": "list_identifiers"}], "rules": [{"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier0", "warn": null}, {"collapsible_value": {"__class__": "RuntimeValue"}, "connectable": true, "error": null, "is_workflow": false, "type": "add_column_metadata", "value": "identifier1", "warn": null}]}
    • Other invocation details
      • history_id

        • 2d78cb836665f42d
      • history_state

        • ok
      • invocation_id

        • 2d78cb836665f42d
      • invocation_state

        • scheduled
      • workflow_id

        • 44565cbdc794da37

@mvdbeek mvdbeek requested review from lldelisle and wm75 April 10, 2026 15:03
@lldelisle
Copy link
Copy Markdown
Contributor

Hi @gdefazio ,
I am currently running the workflow to really understand what it is doing. I will give my comments before one week.

@lldelisle
Copy link
Copy Markdown
Contributor

lldelisle commented Apr 13, 2026

Hi @gdefazio,
The workflow takes as input a list of project IDs, then it generates one file per project ID with all SRA inside. This is given to fasterq-dump and then the collections are flatten to get a list collection for SR and a list:pair collection for PE.

I think your workflow is a great idea but I have few problems with the current workflow:

  • The number of SRA per project can be large and therefore the fasterqdump step can be very long and I agree with @mvdbeek it would be better to use the subworkflow "Parallel accession download".
  • To get the SRA the workflow takes the first column however, I feel like if you do not choose detailed (metadata download) to true then accession_run is present in the file but is not the first column.

I have noticed something that could be improved:

  • The identifiers of the metadata tables should be the project IDs

So I propose here a new version: https://usegalaxy.eu/u/delislel/w/metadata-and-sequences-from-bioproject-ids-final-ld

What I did compared to your workflow:

  • I used the Parallel accession download as subworkflow and as input I simply concatenated all the files with SRA into one (you loose the info of from which project it comes from but in your version as at the end you flatten the collections I would say it is the same).
  • I used a awk step to find the column ID with 'run_accession' and output the SRA list in case the user has not put detailed (metadata download) to true.
  • I inserted a step to relabel the metadata tables with the project ID.

I let you try it and tell me what you think.

@mvdbeek is there a policy in case a workflow uses another workflow as subworkflow? Do they need to be in the same directory?

Don't hesitate if you have questions or remarks, I would be happy to answer.

@mvdbeek
Copy link
Copy Markdown
Member

mvdbeek commented Apr 13, 2026

Subworkflows are just embedded inside the parent. We're working on making those references too (galaxyproject/galaxy#21887) and use symlinks in the IWC but for now this is fine as is and will just work.

@gdefazio
Copy link
Copy Markdown
Author

gdefazio commented Apr 13, 2026

Hi @lldelisle and thank you for the valuable work on this wf and for positive feedback.
Let me answer you by poiny.

  • The number of SRA per project can be large and therefore the fasterqdump step can be very long and I agree with @mvdbeek it would be better to use the subworkflow "Parallel accession download".

After Marcus comment I tried to integrate "Parallel accession download" in this WF but I had some problems with the "apply rules" step because of issue with nested structure non matching what expected. Thank you for solving that.

  • To get the SRA the workflow takes the first column however, I feel like if you do not choose detailed (metadata download) to true then accession_run is present in the file but is not the first column.

I was not aware about that, thanks for solving it.

I have noticed something that could be improved:

  • The identifiers of the metadata tables should be the project IDs

So I propose here a new version: https://usegalaxy.eu/u/delislel/w/metadata-and-sequences-from-bioproject-ids-final-ld

What I did compared to your workflow:

  • I used the Parallel accession download as subworkflow and as input I simply concatenated all the files with SRA into one (you loose the info of from which project it comes from but in your version as at the end you flatten the collections I would say it is the same).

Do you suggest to have one fastq collection for each BioProject ID?

  • I used a awk step to find the column ID with 'run_accession' and output the SRA list in case the user has not put detailed (metadata download) to true.
  • I inserted a step to relabel the metadata tables with the project ID.

I let you try it and tell me what you think.

I tried it and I think is so much better than my version. Thank you.

@mvdbeek is there a policy in case a workflow uses another workflow as subworkflow? Do they need to be in the same directory?

Don't hesitate if you have questions or remarks, I would be happy to answer.

Can I add you as WF author?

Thanks again for your effort.

@gdefazio
Copy link
Copy Markdown
Author

Hi @lldelisle please give me some feedbacks to the previous message when you have time. Thanks in advance

@lldelisle
Copy link
Copy Markdown
Contributor

Hi,
Yes I would be happy to be an author.
I don't think it is necessary to return one collection per project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants