-
Notifications
You must be signed in to change notification settings - Fork 89
Add RNA-seq Variant Discovery (RNAvar) workflow #1188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| version: 1.2 | ||
| workflows: | ||
| - name: rna-variant-discovery | ||
| subclass: Galaxy | ||
| publish: true | ||
| primaryDescriptorPath: /RNAvar.ga | ||
| testParameterFiles: | ||
| - /RNAvar-tests.yml | ||
| authors: | ||
| - name: "Emre Gülben" | ||
| orcid: "https://orcid.org/0009-0009-9085-1055" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| # Changelog | ||
|
|
||
| ## [0.1] - 2026-03-27 | ||
|
|
||
| - First release. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| # RNA-seq Variant Discovery (RNAvar) | ||
|
|
||
| This workflow is an implementation of a standard RNA-seq variant discovery pipeline. It handles the transition from raw reads to annotated variants using industry-standard tools. | ||
|
|
||
| ## Workflow Logic | ||
| 1. **Alignment:** **STAR** in 2-pass mode for high-accuracy splice-aware mapping. | ||
| 2. **Preprocessing:** **MarkDuplicates** and **GATK4 SplitNCigarReads** to prepare the BAM for variant calling by splitting reads into exon segments. | ||
| 3. **Recalibration:** **GATK4 BaseRecalibrator** (BQSR) using known polymorphic sites to adjust base quality scores. | ||
| 4. **Variant Calling:** **GATK4 HaplotypeCaller** with parameters specifically tuned for RNA-seq (e.g., ignoring soft-clipped bases). | ||
| 5. **Filtering:** **bcftools filter** applying hard filters ($FS > 30.0$, $QD < 2.0$) to reduce false positives. | ||
|
emregulben marked this conversation as resolved.
Outdated
|
||
| 6. **Annotation:** Functional annotation of variants using **SnpEff** against the hg38 database. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It makes limited sense to hard-code the SnpEff database and, thereby, restrict analysis to just one organism and genome version. The database choice should instead be turned into a workflow parameter. I would populate the user choices from the "Locally installed SnpEff databases" instead of letting the user download databases on demand. Please let us know if you need help with this.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done here: 5146320 |
||
|
|
||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After presenting the current workflow logic, it wouldbe good to have a "Limitations" section, where you discuss which features of the nf-core ppeline are currently not yet implemented. Things hat I can immediately think of here are
Basically, compare https://nf-co.re/rnavar/1.2.3/docs/usage/ to what's implemented here.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added the limitations section: 5146320 |
||
| ## Testing and Parity | ||
| The workflow is validated against a targeted subset of human RNA-seq data on chromosome 22. It successfully identifies the 54 high-confidence variants required to match the output of the original discovery engine. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Be more precise here and state the origin of that test data. |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| - doc: Test outline for RNAvar | ||
| job: | ||
| "Forward Reads (R1)": | ||
| class: File | ||
| path: test-data/test_rnaseq_1.fastq.gz | ||
| filetype: fastqsanger.gz | ||
| "Reverse Reads (R2)": | ||
| class: File | ||
| path: test-data/test_rnaseq_2.fastq.gz | ||
| filetype: fastqsanger.gz | ||
| "Reference Genome": | ||
| class: File | ||
| path: test-data/genome.fasta | ||
| filetype: fasta | ||
| "Genome Annotation": | ||
| class: File | ||
| path: test-data/genome.gff3 | ||
| filetype: gff3 | ||
| "Known Variants (dbSNP)": | ||
| class: File | ||
| path: test-data/dbsnp_146.hg38.vcf | ||
| filetype: vcf | ||
| "Known Indels": | ||
| class: File | ||
| path: test-data/mills_and_1000G.indels.vcf | ||
| filetype: vcf | ||
| outputs: | ||
| "Final Annotated Variants": | ||
| asserts: | ||
| # Expected variants: 54 | VCF Header lines: 36 | Total: 90. | ||
| - has_n_lines: | ||
| n: 90 | ||
| # Verify the content is scientifically correct by checking for | ||
| # the presence of the target chromosome string. | ||
| - has_text: | ||
| text: "chr22" | ||
| # Ensure SnpEff successfully added functional annotations | ||
| - has_text: | ||
| text: "ANN=" | ||
| "MultiQC Quality Report": | ||
| asserts: | ||
| - has_text: | ||
| text: "MultiQC" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, the README should clearly state that the aim is to provide a faithful reimplementation of https://nf-co.re/rnavar and that currently there are still limitations.