Skip to content

Fix/52 pcgr skip high variant count#32

Open
qclayssen wants to merge 5 commits into
release/0.3.0from
fix/52-pcgr-skip-high-variant-count
Open

Fix/52 pcgr skip high variant count#32
qclayssen wants to merge 5 commits into
release/0.3.0from
fix/52-pcgr-skip-high-variant-count

Conversation

@qclayssen

@qclayssen qclayssen commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

qclayssen added 2 commits June 2, 2026 10:03
When select_pcgr_variants cannot bring the PASS count below
MAX_SOMATIC_VARIANTS via tiered filtering it raises RuntimeError,
aborting the entire report step. This left sash with no usable
SMLV_SOMATIC_REPORT output for samples with very high variant counts
(e.g. L2100242 with 595,416 PASS variants — high CNA complexity, not
a true hypermutator).

Catch the RuntimeError, log a warning, and skip prepare_vcf_somatic
and run_somatic. Non-PCGR outputs (bcftools stats, AF distributions,
variant counts) continue to publish. Companion sash change marks the
PCGR emits as optional.

Refs: umccr/sash#52
Add three tests:
- select_pcgr_variants raises RuntimeError when all SAGE_HOTSPOT
  variants (RETAIN_FIELDS) make tiered filtering impossible
- entry() skips prepare_vcf_somatic + run_somatic when the
  RuntimeError is caught (core of the sash #52 fix)
- entry() calls run_somatic normally when count is within the limit
  (regression guard)
@qclayssen qclayssen self-assigned this Jun 2, 2026
qclayssen added 3 commits June 3, 2026 11:36
PCGR writes PCGR_MUTATION_HOTSPOT=. (Type=String placeholder) on every
non-hotspot variant. cyvcf2 returns the string '.' which Python evaluates
as truthy, so any(variant.INFO.get(e) ...) always returned True — ALL
variants were treated as retained, variants_sorted stayed empty, and
select_pcgr_variants raised RuntimeError for any sample with >450k
PASS variants (sash #52 root cause).

Fix: exclude '.' alongside None so only genuinely set String/Flag fields
trigger retention. Adds regression test with PCGR_MUTATION_HOTSPOT=.
fixture to prevent silent recurrence.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses sash issue #52 by preventing PCGR processing from breaking hypermutated samples when tiered filtering cannot reduce PASS variants below MAX_SOMATIC_VARIANTS, and by fixing retained-variant detection for PCGR_MUTATION_HOTSPOT when PCGR writes a dot (.) placeholder value.

Changes:

  • Catch RuntimeError from select_pcgr_variants() in entry() and skip PCGR entirely when the variant cap cannot be resolved.
  • Fix retained-variant detection so INFO string placeholder '.' does not cause false retention during tiered filtering.
  • Add/extend tests covering the dot-placeholder hotspot bug, the “all retained variants” overflow case, and the “skip PCGR on overflow” CLI behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
tests/test_pcgr_hypermutated.py Adds regression and integration tests for PCGR trimming/retention edge cases and ensures entry() skips PCGR when trimming cannot reduce counts.
bolt/workflows/smlv_somatic/report.py Implements skip-on-unresolvable-overflow behavior in entry() and fixes retained-variant detection to ignore '.' placeholder values.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@qclayssen qclayssen marked this pull request as ready for review June 5, 2026 05:21
@qclayssen qclayssen requested a review from scwatts June 5, 2026 05:21

@scwatts scwatts left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any information on testing done or plans to test would be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants