Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
version: 1.2
workflows:
- name: main
subclass: Galaxy
publish: true
primaryDescriptorPath: /metadata-and-sequences-from-bioproject-ids.ga
testParameterFiles:
- /metadata-and-sequences-from-bioproject-ids-tests.yml
authors:
- name: Giuseppe Defazio
orcid: 0000-0002-9356-5224
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
version: '0.1'
registries:
- url: https://workflowhub.eu
project: iwc
workflow: metadata-and-sequences-from-bioprojects-ids/main
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Changelog

## [0.1] - 2026-03-23

- Added workflow
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Metadata and Sequences from BioProject IDs

This workflow takes BioProject IDs as input and is able to retrieve SRA tables and FASTQ files from IDs using pysradb and SRA fetching.
The workflow may be very useful in Meta-analysis and reanalysis scenarios, giving the possibility to collect metadata and data from BioProject IDs of studies with the same design.

## Input

The workflow needs a single txt input file, without header, with the first column reporting one or more BioProject IDs as follows:

````
PRJNA1425250
PRJNA1417619
PRJNA1425251
PRJNA1417617
PRJNA1425252
PRJEB1417616
````

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind to describe here the parameters you expose, to guide the users?


## Outputs

There are 3 main outputs:

- Data collection for SRA manifest of input BioProject ID(s)
- Data collection for Paired End FASTQ files
- Data collection for Single End FASTQ files
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
- doc: Test 1 for Metadata-and-Sequences-from-BioProjectIDs
job:
BioProject IDs:
class: File
path: test-data/test1_single_prj_pe.txt
filetype: txt
assay (metadata download): false
desc (metadata download): false
detailed (metadata download): true
expand (metadata download): false
Group by Experiments (fastq download): false
Group by Sample (fastq download): false
Comment on lines +11 to +12
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these parameters disappeared from the workflow

outputs:
Metadata file ( SRA table ):
element_tests:
PRJNA1417618:
path: test-data/test1_metadata_file_split_file_000000.txt.tsv
Paired End Reads:
element_tests:
SRR37073390:
forward:
path: test-data/test1_paired_end_collection_forward.fastq
decompress: true
compare: contains
reverse:
path: test-data/test1_paired_end_collection_reverse.fastq
decompress: true
compare: contains

- doc: Test 2 for Metadata-and-Sequences-from-BioProjectIDs
job:
BioProject IDs:
class: File
path: test-data/test2_multiple_prj_mixed.txt
filetype: txt
assay (metadata download): false
desc (metadata download): false
detailed (metadata download): true
expand (metadata download): false
Comment on lines +36 to +39
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use a different combination from the first test, to increase the test coverage.

Group by Experiments (fastq download): false
Group by Sample (fastq download): false
Comment on lines +40 to +41
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these parameters disappeared from the workflow

outputs:
Metadata file ( SRA table ):
element_tests:
PRJNA1425250:
path: test-data/test2_metadata_file_split_file_000000.txt.tsv
compare: contains
PRJNA1417618:
path: test-data/test2_metadata_file_split_file_000001.txt.tsv
compare: contains
Paired End Reads:
element_tests:
SRR37273408:
forward:
path: test-data/test2_SRR37273408_forward.fastq
decompress: true
compare: contains
reverse:
path: test-data/test2_SRR37273408_reverse.fastq
decompress: true
compare: contains
SRR37073390:
forward:
path: test-data/test2_SRR37073390_forward.fastq
decompress: true
compare: contains
reverse:
path: test-data/test2_SRR37073390_reverse.fastq
decompress: true
compare: contains
Single End Reads:
element_tests:
SRR37273407:
path: test-data/test2_SRR37273407_forward.fastq
decompress: true
compare: contains
Loading
Loading