Fix/52 pcgr skip high variant count#32
Open
qclayssen wants to merge 5 commits into
Open
Conversation
When select_pcgr_variants cannot bring the PASS count below MAX_SOMATIC_VARIANTS via tiered filtering it raises RuntimeError, aborting the entire report step. This left sash with no usable SMLV_SOMATIC_REPORT output for samples with very high variant counts (e.g. L2100242 with 595,416 PASS variants — high CNA complexity, not a true hypermutator). Catch the RuntimeError, log a warning, and skip prepare_vcf_somatic and run_somatic. Non-PCGR outputs (bcftools stats, AF distributions, variant counts) continue to publish. Companion sash change marks the PCGR emits as optional. Refs: umccr/sash#52
Add three tests: - select_pcgr_variants raises RuntimeError when all SAGE_HOTSPOT variants (RETAIN_FIELDS) make tiered filtering impossible - entry() skips prepare_vcf_somatic + run_somatic when the RuntimeError is caught (core of the sash #52 fix) - entry() calls run_somatic normally when count is within the limit (regression guard)
PCGR writes PCGR_MUTATION_HOTSPOT=. (Type=String placeholder) on every non-hotspot variant. cyvcf2 returns the string '.' which Python evaluates as truthy, so any(variant.INFO.get(e) ...) always returned True — ALL variants were treated as retained, variants_sorted stayed empty, and select_pcgr_variants raised RuntimeError for any sample with >450k PASS variants (sash #52 root cause). Fix: exclude '.' alongside None so only genuinely set String/Flag fields trigger retention. Adds regression test with PCGR_MUTATION_HOTSPOT=. fixture to prevent silent recurrence.
There was a problem hiding this comment.
Pull request overview
This PR addresses sash issue #52 by preventing PCGR processing from breaking hypermutated samples when tiered filtering cannot reduce PASS variants below MAX_SOMATIC_VARIANTS, and by fixing retained-variant detection for PCGR_MUTATION_HOTSPOT when PCGR writes a dot (.) placeholder value.
Changes:
- Catch
RuntimeErrorfromselect_pcgr_variants()inentry()and skip PCGR entirely when the variant cap cannot be resolved. - Fix retained-variant detection so
INFOstring placeholder'.'does not cause false retention during tiered filtering. - Add/extend tests covering the dot-placeholder hotspot bug, the “all retained variants” overflow case, and the “skip PCGR on overflow” CLI behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
tests/test_pcgr_hypermutated.py |
Adds regression and integration tests for PCGR trimming/retention edge cases and ensures entry() skips PCGR when trimming cannot reduce counts. |
bolt/workflows/smlv_somatic/report.py |
Implements skip-on-unresolvable-overflow behavior in entry() and fixes retained-variant detection to ignore '.' placeholder values. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
scwatts
approved these changes
Jun 5, 2026
scwatts
left a comment
Member
There was a problem hiding this comment.
Any information on testing done or plans to test would be great!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See sash issue