Skip to content

Plot Plugins#653

Open
Colelyman wants to merge 1 commit into
pinellolab:masterfrom
edilytics:master
Open

Plot Plugins#653
Colelyman wants to merge 1 commit into
pinellolab:masterfrom
edilytics:master

Conversation

@Colelyman

Copy link
Copy Markdown
Contributor

This PR extracts the data preparation for the CRISPResso plots into individual functions, so that they can be called independently, and make the plot plugins in CRISPRessoPro possible.

* Sam/pixi integration tests (#156)

* Guardrail dropdowns (#155)

* Add discovery of Python packages (#151)

* Fix Docker entrypoint

* Attempt to fix Circle CI pip install

* Run each CRISPResso command in the conda environment

* Update `--base_editor_output` parameter name for Circle CI

* Update tests Makefile and batch expected output

* Fix QWC inference across amplicons (#137)

* Mckay/be plot improvements (#136)

* trying to get the figure to fit nicely, increased
element size to 100

* custom figsize to display without cutting off

increased figsize in report template

* Allow messages to be served in CLI reports (#134) (pinellolab#583)

* Fix deletion at second position (#131)

* Fix bug when there is a deletion starting at the second position

This bug only happens when a deletion starts are the second position, before the
fix, it would report that the deletion started at the first position. It is
fixed now, so deletions at the second position are reported correctly.

* Update CRISPRessoCOREResources.c due to change in .pyx

* Add tests for find_indels_substitutions for deletions at the end

* Fix 1bp deletions at the end, and off by one error

This ensures that when a deletion occurs at the end of a read, the entire
deletion is accounted for.

* Update CRISPRessoCOREResources.c to reflect fixes for deletions at the end of alignments

* Add extra asserts to deletion checks

* Point to new test branch

* Reafctor deletion_coordinates to go past the end of the string for deletions at
the end of the sequence

* Point tests to master

* Allow messages to be served in CLI reports

* Point to cole/messages test branch

* Point tests back to master

* point to tests branch

* typo

* testing github actions

* remove test

* point tests to master

---------

Co-authored-by: Cole Lyman <Cole@colelyman.com>

* Update inferred QWC tests to reflect correct intended behavior

* Fix inferring QWC to match intended behavior

* Add more test cases and fix bug discovered in single bp QWC

* Add even more test cases testing indels outside the QWC

* Point tests to cole/fix-qwc-deletion

* update plotly.js (#138)

* Change order of amplicon inference alignment so that 1st amplicon is the reference

This makes a difference because it changes the values of `s1inds`, and therefore
the value of the inferred quantification window coordinates.

* Point integration tests back to master

* Update CHANGELOG.md

---------

Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com>

* Update setup.py and pyproject.toml to find all modules and packages

* Convert pyproject.toml to Unix file endings

---------

Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com>

* Guardrail dropdowns

* Revert changes to .c files

* Closed guardrail warning by default

* .c fix

* Revert .c files to match master

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Revert .c files back to master

* Update CHANGELOG.md

* Update CHANGELOG.md

---------

Co-authored-by: Cole Lyman <Cole@colelyman.com>
Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Run integration tests

* Fix pixi env

* Remove pro from pixi.toml

* Pull branches

* Adding Pro token

* Add pip install kaleido for pro tests

* Pro compare

* Pin plotly

* Pin kaleido

* Fix kaleido pin

* Fix pixi plotly pin

* Ruff fix

* Update integration_tests.yml

Remove branch ref for Pro

* Run the diffs and print the results in integration tests

* Point tests to master

---------

Co-authored-by: Cole Lyman <Cole@colelyman.com>
Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add PlotContext dataclass to CRISPResso2

PlotContext is the only part of the plugin architecture that lives in
CRISPResso2. It provides a read-only view into CORE's computed data
that CRISPRessoPro and plugins consume to generate custom plots.

- Zero-copy: wraps references to CORE's local variables
- Scope fields (ref_name, sgRNA_ind) are mutable for iteration
- HDR vectors and config fields have sensible defaults

6 passing tests. All 591 existing tests unaffected.

* feat: add Pro hooks to CRISPRessoCORE and CRISPRessoShared

CRISPRessoCORE.py:
- Plot hook: After plot pool joins, constructs PlotContext and calls
  pro_hooks.on_plots_complete() when Pro is installed. Plugin plots
  run here. Errors handled per halt_on_plot_fail.
- Report hook: Routes make_report to Pro when installed, falls back
  to CRISPRessoReport.make_report when not.

CRISPRessoShared.py:
- getCRISPRessoArgParser now loads Pro's args.json if CRISPRessoPro
  is installed, adding --report_config, --exclude_plots, etc.
- Refactored arg loading into _add_args_from_dict helper.

All 591 existing tests pass (590 + 1 env-dependent skip).
Integration tested: Pro report generates correctly with --use_matplotlib.

* docs: clarify PlotContext mutability contract

Replace 'read-only' framing with an explicit warning that mutation
will corrupt results. Python can't enforce immutability on dicts and
arrays — frozen=True only prevents field reassignment, not mutation
of the underlying data structures. Document it as a contract instead.

* Add pro environment

* refactor: rename CRISPResso2/plotting to CRISPResso2/plots

Matches CRISPRessoPro/plots convention. Updated all imports in
CRISPRessoCORE.py, plots/__init__.py, and test_plot_context.py.

* feat: gate CORE plot generation through Pro registry

When CRISPRessoPro is installed, builds the PlotRegistry before the
plot generation block and wraps the plot() function. Each call checks
the filename prefix from the args dict against the set of enabled
prefixes derived from the registry. Disabled plots are skipped
entirely — no rendering, no PNG on disk.

The registry is built once, respects --report_config and
--exclude_plots/--include_plots, and falls through gracefully if
anything fails (all plots run as before).

When Pro is NOT installed, plot() is unchanged — all plots generate.

Covers all 5 filename root key variants used across the 38 built-in
plots: fig_filename_root, fig_root, plot_root, plot_path,
piechart_plot_root, barplot_plot_root.

* revert: remove plot gating from CRISPRessoCORE

The registry/config/prefix-matching logic was Pro implementation
leaked into the open-source codebase. The correct approach per the
design doc is CRISPRessoPlotData.py — extract data prep into pure
functions so Pro can generate its own plots without any plugin
machinery in CORE.

* feat: extract Batch 1 data prep functions into CRISPRessoPlotData

New module CRISPResso2/plots/data_prep.py with 5 pure functions:
- prep_read_barplot (plot_1a)
- prep_class_piechart_barplot (plot_1b/1c)
- prep_allele_homology (plot_1e)
- prep_amplicon_modifications (plot_4a)
- prep_modification_frequency (plot_4b)

All Pattern A (trivial pass-through). CORE now calls these instead
of building dicts inline. Output is identical — verified by
integration tests with diff against expected results.

10 new unit tests, 600 total passing.

* refactor: revert trivial pass-through extractions, keep computed ones

Removed prep_read_barplot, prep_class_piechart_barplot, and
prep_allele_homology — these were pure packaging with zero reuse
value. Pro can build those dicts directly from PlotContext.

Kept prep_amplicon_modifications (computes y_max, builds plot_titles)
and prep_modification_frequency (builds plot_title with ref_name logic)
since they have computation Pro would otherwise duplicate.

5 unit tests, integration verified with diff.

* fix: pass custom_config to PlotContext construction

Pro data_funcs need custom_config['colors'] to generate plots.

* refactor: use inline-snapshot for data_prep tests, pass custom_config to PlotContext

* build: add inline-snapshot to test and pro pixi environments

* build: ignore N815 lint rule for domain-specific mixed-case class vars

* cleanup: empty plots/__init__.py, remove unused re-export

* extract prep_indel_size_distribution (plot_3a)

99th percentile clipping of xmin/xmax from histogram data, with
±15 clamping. Also extracts _clip_to_percentile helper.

5 new unit tests, CORE wired to call data_prep function.
Integration verified — output identical to expected.

* extract prep_frequency_deletions_insertions (plot_3b)

Triple 99th percentile clipping for ins/del/sub histograms using
hdensity.sum() as the total. Adds total= parameter to
_clip_to_percentile helper.

4 new unit tests, CORE wired to call data_prep function.
Integration verified — output identical to expected.

* refactor: use whole-dict snapshots in data_prep tests

Wrap entire result in snapshot() instead of cherry-picking keys.
Gives full coverage — any new/changed key auto-updates via
--inline-snapshot=fix. Recursive _to_serializable handles numpy.

* docs: add data prep extraction TODO list to CLAUDE.md

* docs: move data prep TODOs from CLAUDE.md to pi todos

* extract prep_dsODN_piechart (plot_1d)

* extract prep_nucleotide_quilt (plot_2a) and prep_nucleotide_quilt_around_sgRNA (plot_2b)

* extract prep_global_modifications_reference (plot_4e/4f)

* extract prep_log_nuc_freqs (plot_10d)

* extract prep_conversion_at_sel_nucs (plot_10e/10f/10g)

* Run diffs on the plots in integration tests

* Correct diff-plots now

* Pin matplotlib in test dependencies and update running integration tests

* Fix cross-pollution of integration tests

* Install MS core fonts in CI for consistent plot rendering

CRISPResso2 requests Arial as the primary font via matplotlib's
font.sans-serif setting. Without MS core fonts installed, the CI
runner falls back to a different font, causing upset plots (and
potentially others) to exceed the RMSE image comparison threshold.

This matches the Dockerfile which already installs
ttf-mscorefonts-installer. Also clears matplotlib's font cache
so it discovers the newly installed fonts.

* Remove matplotlib pin in tests and don't run diff-plots with Pro

* Reorder base edit plots and update caption in Compare

* Rebuild pixi fontconfig cache after MS font installation in CI

The pixi/conda environment has its own fontconfig with a separate
font cache (.pixi/envs/test/var/cache/fontconfig/). When the pixi
environment is restored from actions/cache, this fontconfig cache
is stale and doesn't know about the MS core fonts installed via
apt-get. Running the system fc-cache only updates the system
fontconfig, not the pixi one.

Fix: run 'pixi run -e test -- fc-cache -f' to rebuild the pixi
fontconfig cache so that matplotlib's findSystemFonts() discovers
Arial via the pixi fc-list.

* Update base editing caption in compare

* Sort Compare allele plots by aligned sequence as well

* Attempt to fix sorting of compare allele plots

* Sort the df being passed into the plot in compare!

* Sort the bottom compare plots too!

* Make the tiebreaking logic in compare enrichment consistent

* Cache MS fonts in pixi

* Point to pro plot-plugin branch

* Extract prep_hdr_nucleotide_quilt (plot_4g) into data_prep.py

Pattern C extraction: moves HDR cross-ref nucleotide quilt DataFrame
assembly from CORE into a reusable data_prep function. The function
builds multi-batch nuc_pct_df and mod_pct_df across all references
with non-zero counts. CORE now calls the extracted function and reads
back the DataFrames for CSV writes and plot input.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Extract prep_global_frameshift_data (plot_5a/6a/8a) into data_prep.py

Pattern C extraction: moves global frameshift/splice/inframe count
aggregation across all coding-seq refs from CORE into a reusable
data_prep function. One shared aggregation feeds three plots: 5a
(global frameshift pie), 6a (global frameshift profiles), and 8a
(global splice sites). HDR special-case logic (adding unmodified
reads to non-frameshift bucket) is preserved.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Extract prep_pe_nucleotide_quilt (plot_11a) into data_prep.py

Pattern C extraction: moves PE cross-ref nucleotide quilt DataFrame
assembly from CORE into a reusable data_prep function. Delegates to
prep_hdr_nucleotide_quilt (same computation) and overrides
quantification_window_idxs with the first reference's include_idxs
(PE amplicons share the same quantification window, unlike HDR).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Extract prep_pe_nucleotide_quilt_around_sgRNA (plot_11b) into data_prep.py

Pattern C extraction: moves PE window-slicing logic from CORE into a
reusable data_prep function. The computation is identical to
prep_nucleotide_quilt_around_sgRNA (plot_2b) but applied to multi-batch
PE DataFrames from prep_pe_nucleotide_quilt. Delegates to the existing
function to avoid duplication.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix ruff linting

* Extract prep_alleles_around_cut (plot_9) into data_prep

Pattern D extraction: separates pct adjustment, groupby collapse,
and sgRNA interval recomputation from CORE's I/O and plot calls.
Window clamping and CRISPRessoShared.get_dataframe_around_cut_asymmetrical
stay in CORE; CSV writes and prep_alleles_table stay in CORE.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Extract prep_base_edit_quilt (plot_10h) into data_prep

Pattern D extraction: separates pct adjustment, groupby collapse,
sgRNA interval recomputation, and x_labels computation from CORE's
I/O and plot calls. Similar to prep_alleles_around_cut but uses a
symmetric window and computes conversion nucleotide positions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Extract prep_amino_acid_table (plot_9a) into data_prep

Pattern D extraction: encapsulates amino acid conversion, cut point
calculation, and CRISPRessoShared.get_amino_acid_dataframe call.
Uses lazy import of CRISPRessoShared to keep data_prep's top-level
imports clean. CSV write stays in CORE.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Audit cleanup: remove dead code, consolidate duplicates, document mutations

1. Delete dead to_numeric_ignore_columns from CRISPRessoCORE.py
   — all callers were extracted to data_prep.py which has its own copy

2. Fix duplicate test_import_from_module in test_plot_context.py
   — Python silently overwrote the first with the second

3. Consolidate prep_alleles_around_cut and prep_base_edit_quilt
   — Extract shared logic into _prep_windowed_alleles helper
   — Both functions now delegate to the helper, adding only their
     unique fields (new_cut_point vs x_labels)

4. Document DataFrame mutation contract
   — Module docstring updated to note the exception
   — _prep_windowed_alleles, prep_alleles_around_cut, and
     prep_base_edit_quilt all have .. warning:: blocks
   — New tests verify mutation identity (returned df IS input df)
     and in-place column modification behavior

* Audit items 5-7: PlotContext nuc summary defaults, HDR vector wiring, remove _jp from data_prep

5. Make nucleotide_frequency_summary and nucleotide_percentage_summary
   optional in PlotContext (default_factory=dict). These are never
   populated by CRISPRessoCORE — they're reserved for Batch/Aggregate.
   Remove the always-{} arguments from the CORE construction call.

6. Wire ref1_all_* vectors into PlotContext construction when
   expected_hdr_amplicon_seq or prime_editing_pegRNA_extension_seq is
   set. Previously these fields always defaulted to {} even when the
   data was available.

7. Remove _jp parameter from prep_global_modifications_reference in
   data_prep.py. The caller (CORE) now computes plot_root and
   plot_title directly and passes them in, keeping path logic in CORE
   where it belongs.

* Move Pro plotting to be before core plotting

* Add parallel pytest dependency

* Add N_READS_INPUT and N_READS_AFTER_PREPROCESSING to alignment_stats

* Add the rest of the plots into the elif to be run on CRISPRessoPro

* Fix unit tests by moving tests for _to_numeric_ignore_columns

* Split up plot_class_piechart_and_barplot into separate functions

* Refactor all plot data prep functions to use PlotContext

* Minor clean up, rename fig_filename_root to plot_root and add types to plot_context

* Fix linting errors

* Point integrations tests to master

* Point tests to cole/plot-plugin branch

* Add a test-pro pixi environment

* Add alt_nuc_counts_by_ref to PlotContext

* Extract out more data prep functions and move amino_acids_to_numbers

* Extract more data_prep functions

* Replace plot_root with fig_filename_root

* Remove _global_data and move imports

* Extract out duplicated code and improve data_prep unit tests

* Rename PlotContext to CorePlotContext and add contexts for other modes

* Move base editing utilities from CRISPRessoCORE to data_prep; add tests

- Move get_refpos_values, get_bp_substitutions, get_upset_plot_counts,
  get_base_edit_target_sequence, write_base_edit_counts out of
  CRISPRessoCORE.py into plots/data_prep.py, eliminating the circular
  import direction (data_prep no longer imports from CORE)

- Extract inline plot_10i logic (~50 lines) from CRISPRessoCORE into
  prep_base_edit_upset; CORE now calls CRISPRessoPlotData.prep_base_edit_upset
  and passes gap_incentive from the ref dict

- Add module-level logger = logging.getLogger(__name__) to data_prep.py
  following the established codebase pattern; remove the misnamed
  _base_edit_logger

- Move all base editing utility tests from test_CRISPRessoCORE.py into
  test_data_prep.py as TestGetRefposValues, TestGetBpSubstitutions,
  TestGetBaseEditTargetSequence, TestGetUpsetPlotCounts,
  TestWriteBaseEditCounts; remove three duplicate/dead clusters from
  test_CRISPRessoCORE.py; write_base_edit_counts test now uses tmp_path

- Add unit tests for prep_alleles_around_cut (pct recalculation,
  new_cut_point in/out of sgRNA interval, groupby collapse) and
  prep_amino_acid_table (plot_input populated above threshold,
  annotate_wildtype_allele label suffix)

- Add sort-order comments in CRISPRessoCompareCORE.py explaining
  deterministic tiebreaker columns and LFC sort direction

- Fix all ruff lint errors (F401 unused imports, E303 blank lines,
  D413 missing blank line after docstring section)

* Pass plot_context to Pro make_report

* Clean up some CRISPRessoCORE plotting parameters

* cleanup: remove redundant variable unpacking, route plot 10i through plot() wrapper

- Remove ref_seq, include_idxs_list, quantification_window_ref_seq,
  tot_aln_reads, cut_points, sgRNA_orig_sequences, sgRNA_names from
  per-ref loop top; move CSV-write variables into the crispresso1_mode
  block where they're used, replace others with refs[ref_name] lookups
- Route plot_combination_upset through the plot() multiprocessing
  wrapper instead of calling it directly
- Add design doc for multi-mode plot plugin work (all 5 remaining modes)

* feat: Pooled plot plugin — context, data prep, Pro hooks

CRISPResso2 side of the Pooled plot plugin:

- Add prep_reads_total() and prep_unmod_mod_pcts() to data_prep.py
  with a prefix parameter for reuse across Pooled/WGS/Aggregate modes
- Construct PooledPlotContext in CRISPRessoPooledCORE before plotting
- Add if C2PRO_INSTALLED branch that calls pro_hooks.on_pooled_plots_complete()
- Refactor elif branch to use prep functions instead of inline args
- Add Pro report hook: pro_hooks.make_pooled_report() when Pro installed
- Add unit tests for both prep functions including cross-context test
  with WGSPlotContext

* fix: pass plot_context to make_pooled_report for config access

* feat: WGS plot plugin — context, data prep, Pro hooks

CRISPResso2 side of the WGS plot plugin:

- Reuse prep_reads_total() and prep_unmod_mod_pcts() from data_prep.py
  (shared with Pooled, added to this branch via cherry-pick)
- Construct WGSPlotContext in CRISPRessoWGSCORE before plotting
- Add if C2PRO_INSTALLED branch calling pro_hooks.on_wgs_plots_complete()
- Refactor elif branch to use prep functions
- Add Pro report hook: pro_hooks.make_wgs_report()
- Preserves WGS-specific metadata keys (reads_summary_plot,
  modification_summary_plot) for backward compat

* feat: Compare plot plugin — context fields, data prep, Pro hooks

CRISPResso2 side of the Compare plot plugin:

- Add 11 mutable scope fields to ComparePlotContext for per-amplicon
  loaded data (profile_1/2, mod_freqs_1/2, consensus_sequence,
  quant_windows_1/2, cut_points, sgRNA_intervals, mod_type, allele_pairs)
- Add prep_compare_editing_barchart() — packages amplicon read counts
  into kwargs for plot_quantification_comparison_barchart
- Add prep_compare_modification_positions() — computes Fisher exact
  tests + Bonferroni correction, returns plot kwargs + mod_df + sig counts
- Add prep_compare_allele_table() — merges allele DataFrames, computes
  LFC, returns top/bottom sorted kwargs for plot_alleles_table_compare
- Construct ComparePlotContext in CRISPRessoCompareCORE after loading
  both run infos
- Add if C2PRO_INSTALLED branch calling pro_hooks.on_compare_plots_complete()
- Refactor elif branch: CORE loads per-amplicon data onto context scope
  fields, calls prep functions for all 3 plot types
- Add Pro report hook
- 13 new unit tests for Compare prep functions including Fisher test
  correctness, LFC computation, and base_edit detection

* feat: Batch plot plugin — context, data prep, Pro hooks

CRISPResso2 side of the Batch plot plugin:

- Add mod_type scope field to BatchPlotContext
- Add 6 prep functions to data_prep.py (typed BatchPlotContext |
  AggregatePlotContext for reuse) plus _compute_sub_sgRNA_intervals helper
- Restructure CRISPRessoBatchCORE: accumulate per-amplicon data into
  dicts during aggregation loop, construct BatchPlotContext after loop,
  add Pro hook branch, gate inline plots on not C2PRO_INSTALLED
- Pro-only heatmap/line plots disabled in elif branch (moved to hook)
- Add Pro report hook
- 10 new unit tests for Batch prep functions

* feat: Aggregate plot plugin — context, data prep, Pro hooks

CRISPResso2 side of the Aggregate plot plugin:

- Add mod_type scope field to AggregatePlotContext
- Cherry-pick Batch prep functions (prep_batch_nuc_quilt*,
  prep_batch_allele_modification_*) and Pooled prep functions
  (prep_reads_total, prep_unmod_mod_pcts) for reuse
- Add group_column to prep_batch_nuc_quilt output (auto-detected
  from DataFrame — 'Batch' or 'Folder')
- Two-pass restructure of CRISPRessoAggregateCORE:
  Pass 1: per-amplicon data aggregation + CSV writes (no plotting)
  Pass 2: context construction + Pro hook or elif plotting with
  pagination (slice returned DataFrames per max_samples_per_summary_plot)
- Aggregate reads_total/unmod_mod_pcts use min_reads_for_inclusion
  (different from Pooled's min_reads_to_use_region), kept inline
- Pro report hook added
- Reuse Batch/Pooled unit tests via cherry-pick (512 total pass)

* Fix import in prep_base_edit_upset

* Fix base editing tests

* Don't pass `--pro` flag, as Pro detection is automatic

* Move caption generation into the prep functions

* Add pro only tests to CI

* Update data_files to be returned from data_prep functions

* Fix bugs and more missing data files

* Fixes for pro

* Extract batch/compare/aggregate elif branches into shared runners

The plot plugin architecture splits CRISPResso2 (CORE) and CRISPRessoPro
along an if C2PRO_INSTALLED: pro_hooks.on_*_plots_complete() vs
elif not args.suppress_plots: boundary.  Before this commit, the elif
branches contained ~100-130 lines of inline plotting logic per mode, which
Pro had no way to invoke — so when Pro was installed, those plots were
simply never generated.

Fix: extract each mode's elif body into a dedicated runner in the new
module CRISPResso2/plots/builtin_runners.py.  Both CORE's elif and
Pro's on_*_plots_complete hook call the same runner, so the two paths
produce the same on-disk output (data files + matplotlib PDFs) without
code duplication.

Runners added:

- run_builtin_batch_plots   — nucleotide quilts + conversion maps
- run_builtin_aggregate_plots — quilts with pagination across samples
- run_builtin_compare_plots — barchart / positions / allele tables +
  Fisher significance counts published via run_data['_sig_counts']

Also:

- Add all_summary_filenames and sub_nucleotide_*_summary_filename fields
  to BatchPlotContext and AggregatePlotContext so the runners can read
  all state from the context alone (no positional argument soup).
- Add data_prep.write_all_core_data_files and sub-helpers
  (write_core_amplicon_data_files, write_core_per_sgRNA_data_files,
  write_core_coding_data_files, write_core_hdr_data_files,
  write_core_prime_editing_data_files) that emit the CSV data files
  CORE's elif branch writes as a side-effect.  Pro's on_plots_complete
  hook calls write_all_core_data_files before running plugin plots so
  Pro mode produces the same Nucleotide_frequency_table.txt,
  Alleles_frequency_table_around_*.txt, amino_acid_table_*.txt,
  10i.*.txt scaffold insertion tables, etc. that non-Pro mode does.
- prep_reads_total / prep_unmod_mod_pcts gain an optional name
  override to preserve Aggregate's historical
  CRISPRessoAggregate_quantification_of_editing_frequency
  filename.  _resolve_min_reads_cutoff handles the Pooled/WGS
  (min_reads_to_use_region) vs Aggregate (min_reads_for_inclusion)
  arg naming split.
- CRISPRessoAggregateCORE's late summary plot block now runs
  unconditionally (CORE matplotlib plots instead of gated on
  !C2PRO_INSTALLED); a future Pro aggregate hook can replace it with
  interactive HTML versions.

Tests: 33/33 non-Pro + 35/35 Pro integration tests pass, 733 unit tests
pass, ruff clean.

* Fix Compare tiebreaker sort + add orjson to pro env for deterministic HTML

Two CI-only failures that local tests (without --diff-plots) didn't catch:

1. prep_compare_allele_table: the tiebreaker columns (Aligned_Sequence,
   Reference_Sequence) were being sorted descending when the original
   CRISPRessoCompareCORE.main() sorted them ascending.  That changed
   which alleles are selected by MAX_N_ROWS (and their order), producing
   different PDF text streams and failing diff-plots comparisons against
   the checked-in expected PDFs.  Revert to ascending tiebreakers to
   match the historical behavior.

2. pixi.toml: add orjson to the pro feature's pypi-dependencies.  Plotly
   falls back to Python's json.dumps when orjson is not present, which
   emits Unicode code points above 0x7F as \uXXXX escape sequences.
   orjson preserves raw UTF-8, matching the checked-in expected
   CRISPRessoPooled_*_summary.html fixtures that contain a raw 😀
   emoji in a sample name.

Local tests already pass without these fixes (CI's environment was the
distinguishing factor).

* core: inline builtin_runners bodies into CORE elif branches; drop conditional Pro imports

Restore the Pro/Core boundary per design_docs/MULTI_MODE_PLOT_PLUGIN.md:
CORE and Pro must not share iteration-logic modules.  The earlier
commit 16f86a2 extracted CORE's elif bodies into a shared
CRISPResso2/plots/builtin_runners.py module that both CORE's elif
and Pro's hooks imported — violating the mutually-exclusive-paths
contract.

This commit undoes the extraction on the CORE side, putting the
iteration code back inline in each mode's elif branch.  Pro's
equivalent code lives in CRISPRessoPro/plots/plot_runners.py (a
verbatim parallel copy — see preceding CRISPRessoPro commit).  The
duplication between CORE inline and Pro's plot_runners is intentional:
it's the cost the design accepts for the boundary, allowing the two
sides to evolve independently.

CORE changes:

- CRISPRessoBatchCORE.py: inline run_builtin_batch_plots body
  (~325 lines) into the elif branch.  Add missing imports:
  ProcessPoolExecutor, wait, partial, prep_batch_* functions.
  Replace the three _should_plot_large(...) calls with
  should_plot_large_plots(num_rows, False, False), the pre-existing
  correct function already in scope.  This fixes a latent regression
  where _should_plot_large always returned True (bypassing the
  sample-count gate — matplotlib could OOM on large batches).

- CRISPRessoCompareCORE.py: inline run_builtin_compare_plots body
  (~250 lines) into the else branch.  Add prep_compare_* imports.
  Compare's runner was synchronous (no ProcessPoolExecutor), so
  no multiprocessing imports are needed.  The Fisher/Bonferroni
  sig_counts are now local variables in the else branch; the
  post-branch read-back pulls from crispresso2_info only when
  C2PRO_INSTALLED (Pro's hook writes them there).

- CRISPRessoAggregateCORE.py: inline run_builtin_aggregate_plots
  body (~390 lines) into the elif branch.  Add prep_batch_nuc_quilt
  imports (prep_reads_total/unmod_mod_pcts were already imported).
  Also restructure the unconditional summary bar plot block at
  line 770-791: it is now wrapped in if C2PRO_INSTALLED / elif so
  CORE emits matplotlib PDFs and Pro emits plotly HTML via the
  new on_aggregate_summary_ready hook (see CRISPRessoPro commit).

Conditional-import cleanup (all 5 CORE modules):

- CRISPRessoBatchCORE.py, CRISPRessoCompareCORE.py, CRISPRessoPooledCORE.py,
  CRISPRessoWGSCORE.py, CRISPRessoAggregateCORE.py: remove the
  pre-plugin-architecture pattern:

      if args.use_matplotlib or not is_C2Pro_installed():
          from CRISPResso2.plots import CRISPRessoPlot
      else:
          from CRISPRessoPro import plot as CRISPRessoPlot

  Replace with unconditional matplotlib import.  In the new
  architecture, CORE's elif branch is only reached when Pro is not
  installed, so the else-import of CRISPRessoPro is unreachable
  dead code.  Removing it eliminates CORE's last references to
  CRISPRessoPro outside the if C2PRO_INSTALLED: pro_hooks.* blocks.

  Subtle consequence in Aggregate: the formerly-unconditional
  summary bar plot block (previously resolved to Pro.plot via the
  conditional) now always uses matplotlib.  The new
  on_aggregate_summary_ready Pro hook restores HTML output for
  Pro mode.

Delete CRISPResso2/plots/builtin_runners.py:

  Nothing imports from it after this commit.  Verified with:
    grep -rn 'from CRISPResso2.plots.builtin_runners' . → empty
    grep -rn 'from CRISPRessoPro' CRISPResso2/ → only inside
      'if C2PRO_INSTALLED:' blocks (hook delegation, legitimate)

Tests: 33/33 non-Pro + 35/35 Pro integration tests pass locally,
734 CRISPResso2 unit tests pass, 175 CRISPRessoPro unit tests
pass, ruff clean.

Note: the +554 LoC in CORE is pure code movement from the deleted
builtin_runners.py file (net neutral).  Review strategy:
git diff 16f86a2~1 HEAD -- CRISPRessoBatchCORE.py to compare
against the pre-extraction state.

* data_prep: sanitize sample names in summary plot kwargs; drop orjson

Make plot output deterministic across environments by sanitizing the
'Name' column of df_summary_quantification via CRISPRessoShared.slugify
before it's embedded into plot kwargs.  Previously, emoji and other
non-ASCII characters in sample names produced environment-dependent
HTML: plotly's orjson backend emits raw UTF-8 while the stdlib json
backend emits \uXXXX escapes, causing integration tests to flap
between local (has orjson) and CI (doesn't).

Changes to data_prep.py:

- New _sanitize_summary_names(df) helper — copies the DataFrame and
  applies slugify(str(name)) to the 'Name' column.  The source df is
  not mutated, so CSVs, log messages, and crispresso2_info keys still
  hold the raw user-supplied names.  Only the plot kwargs see the
  sanitized form.  Missing Name column, non-string entries (int, None,
  NaN), and empty DataFrames are all handled without crashing.

- prep_reads_total / prep_unmod_mod_pcts both wrap
  ctx.df_summary_quantification with _sanitize_summary_names() before
  returning.

Tests added (tests/unit_tests/test_data_prep.py):

- TestSanitizeSummaryNames class: 7 new tests covering ASCII pass-through,
  emoji stripping, source-not-mutated, non-string coercion, missing
  column, empty df, and propagation to prep_unmod_mod_pcts.

- TestPrepReadsTotal.test_dataframe_is_a_copy — updated from the
  previous 'is same reference' assertion which is no longer true.

pixi.toml changes:

- Drop orjson from [feature.pro.pypi-dependencies] — no longer needed
  once sanitization keeps output pure-ASCII.  (orjson is still present
  transitively as a kaleido dependency in local envs, but it's not a
  correctness requirement anymore.)

- Pin kaleido==0.2.1 — kaleido 1.x requires plotly >= 6.1.1 but the
  project is still on plotly 5.18.0.  Without the pin, pixi was
  auto-upgrading to kaleido 1.2.0 and static image export crashed.
  This matches the manual pip install in CI (pip install kaleido==0.2.1).

Tests: 33/33 non-Pro + 35/35 Pro integration tests pass locally.
741 CRISPResso2 unit tests pass (up from 734; +7 new sanitization
tests).  Ruff clean.

* Update n_processes and halt_on_plot_fail

* config: add Pro-only --config_json parsing and strict validation

* Fix how pixi manages environments, clean up the delination with pro and testing (#168)

* Fix how pixi manages environments, clean up the delination with pro and testing

* Fix how pixi environment is specified

* Fix when the repos are checked out

* Move CRISPRessoPro to sibling directory

* fix pixi

* Reduce docker size (#170)

* Limit copying the pixi environment

* docker: prune pixi runtime env and run without pixi

* pixi: use matplotlib-base and drop default compiler deps

* pixi: move plotly from default to pro feature

* Cache *.pyc files in Docker image

* Add `Aligned_Sequence` to compare output (#171)

* Add Aligned_Sequence to CRISPRessoCompare output

The index was set to None, which led to the Aligned_Sequence not being output

* Point to updated test branch

* Remove --cov from unit tests

* Point tests back to master

* ruff fix

* empty commit

* Attempt to fix CRISPRessoCompare test

* Add Aligned_Sequence to Compare output

* point tests to cole/fresh-plot-plugin

* point tests to cole/plot-plugin

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

* remove index=None

* add back index=None

* make to_csv() args explicit

* empty commit

* empty commit

* empty commit

* empty

* empty

* empty

* empty

* empty

* When suppress_plots is True, write the data CRISPResso data files

* Fix bug where last base of sequence isn't plotted

* Simplify calling pro hooks

* Quote zipped results better and simplify is_C2Pro_installed

* Fix quantification window at reference end (#174)

* Move CRISPRessoShared import to shared global in data_prep.py

* Move find_closest_sgRNA_cut_point function from CORE into data_prep

* Move get_pe_scaffold_search into data_prep

* Add logger in data_prep

* remove comment

* remove comment

* remove comment

* remove old design doc

* remove README changes

* remove n_this_category_modified

* extract non-pro path to data_prep.write_core_amplicon_data_files()

* empty

* change key to "figures"

* point branch checkouts to master

---------

Co-authored-by: Samuel Nichols <Snic9004@gmail.com>
Co-authored-by: Cole Lyman <Cole@colelyman.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants