Skip to content

Add GATK contamination check to complement VerifyBamID2#758

Open
dorotejavujinovic wants to merge 14 commits into
nf-core:devfrom
dorotejavujinovic:gatk-contamination-clean
Open

Add GATK contamination check to complement VerifyBamID2#758
dorotejavujinovic wants to merge 14 commits into
nf-core:devfrom
dorotejavujinovic:gatk-contamination-clean

Conversation

@dorotejavujinovic

Copy link
Copy Markdown

PR checklist

  • This comment contains a description of changes (with reason)
  • If you've fixed a bug or added code that should be tested, add tests!
  • If necessary, also make a PR on the nf-core/raredisease branch on the nf-core/test-datasets repo
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Make sure your code lints (nf-core lint .).
  • Documentation in docs is updated
  • CHANGELOG.md is updated
  • README.md is updated

Description

Adds GATK-based contamination detection to complement VerifyBamID2.

Background

  • VerifyBamID2 works well for WGS but has significant limitations with WES data
  • GATK CalculateContamination performs better on targeted sequencing (WES)
  • Having both methods for WGS provides cross-validation

Implementation

  • New subworkflow: CONTAMINATION_CHECK using GATK4 GetPileupSummaries and CalculateContamination
  • New module: PARSE_CONTAMINATION for MultiQC integration
  • Conditional interval handling: WGS (genome-wide) vs WES (target regions)
  • MultiQC configuration with color-coded thresholds

Usage

params.run_contamination = true
params.contamination_sites = "small_exac_common_3.hg38.vcf.gz"
params.contamination_sites_tbi = "small_exac_common_3.hg38.vcf.gz.tbi"

Testing

Tested on both WGS and WES samples with successful integration into MultiQC reports.

Doroteja Vujinovic and others added 7 commits November 26, 2025 11:06
- Add CONTAMINATION_CHECK subworkflow using GATK4
- Add PARSE_CONTAMINATION module for MultiQC integration
- Add GATK4 GetPileupSummaries and CalculateContamination modules
- Implement conditional intervals handling (WGS vs WES)
- Update workflow to integrate contamination check after QC_BAM
- Configure MultiQC to display contamination results
Updated GATK contamination configuration for clarity and consistency.
Introduced GATK contamination check for WES/WGS samples, added new parameters and subworkflow, and updated MultiQC configuration.
@nf-core-bot

nf-core-bot commented Dec 15, 2025

Copy link
Copy Markdown
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.5.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@dorotejavujinovic dorotejavujinovic changed the title Gatk contamination clean Add GATK contamination check to complement VerifyBamID2 Dec 15, 2025

@ramprasadn ramprasadn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @dorotejavujinovic!

ext.prefix = { "${meta.id}_sorted_md" }
publishDir = [
enabled: !params.save_mapped_as_cram,
enabled: true,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious why do you want to change this?

withName: '.*ALIGN:ALIGN_BWA_BWAMEM2_BWAMEME:SAMTOOLS_INDEX_MARKDUP' {
publishDir = [
enabled: !params.save_mapped_as_cram,
enabled: true,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this

script:
def prefix = task.ext.prefix ?: "${meta.id}"
"""
#!/usr/bin/env python3

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this a module binary? We have had issues in the past with some systems interpreting indents differently.

v.write('"${task.process}":\\n')
v.write(' python: "3.11"\\n')
"""
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can you add a stub section?

Comment thread CHANGELOG.md Outdated
Comment on lines +8 to +14
### Added

- Added GATK contamination check for WES/WGS samples as complement to VerifyBamID2
- New parameters: `run_contamination`, `contamination_sites`, `contamination_sites_tbi`
- CONTAMINATION_CHECK subworkflow using GATK4 GetPileupSummaries and CalculateContamination
- PARSE_CONTAMINATION module for MultiQC integration
- Contamination results displayed in MultiQC with color-coded thresholds

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add your log entries to 2.7.0dev since its the one in development. And don't forget to link the PR to your entries ;)

Also, we have a separate table for parameters and new tools under the ##Fixed section of 2.7.0dev, so you can add that information there.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test for this subworkflow? We are currently in the process of adding subworkflow level tests using nf-test, so it would be fantastic if you can include one for this subworkflow.

Doroteja Vujinovic added 5 commits March 23, 2026 14:33
…f-tests

- Convert parse_contamination inline Python to module binary (bin/)
- Add stub section to parse_contamination module
- Move CHANGELOG entries into 2.7.0dev section with PR links and tables
- Revert unrelated publishDir changes in align config
- Add nf-test subworkflow tests for contamination_check (WGS + WES)
Missing closing brace and installed_by field caused nf-core lint to
crash with TypeError during dependency recreation.
Update git_sha to ae8cd884f895 which matches the actual installed
module versions (GATK 4.6.2.0 with nf-core code style updates).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants