Plot Plugins#653
Open
Colelyman wants to merge 1 commit into
Open
Conversation
* Sam/pixi integration tests (#156) * Guardrail dropdowns (#155) * Add discovery of Python packages (#151) * Fix Docker entrypoint * Attempt to fix Circle CI pip install * Run each CRISPResso command in the conda environment * Update `--base_editor_output` parameter name for Circle CI * Update tests Makefile and batch expected output * Fix QWC inference across amplicons (#137) * Mckay/be plot improvements (#136) * trying to get the figure to fit nicely, increased element size to 100 * custom figsize to display without cutting off increased figsize in report template * Allow messages to be served in CLI reports (#134) (pinellolab#583) * Fix deletion at second position (#131) * Fix bug when there is a deletion starting at the second position This bug only happens when a deletion starts are the second position, before the fix, it would report that the deletion started at the first position. It is fixed now, so deletions at the second position are reported correctly. * Update CRISPRessoCOREResources.c due to change in .pyx * Add tests for find_indels_substitutions for deletions at the end * Fix 1bp deletions at the end, and off by one error This ensures that when a deletion occurs at the end of a read, the entire deletion is accounted for. * Update CRISPRessoCOREResources.c to reflect fixes for deletions at the end of alignments * Add extra asserts to deletion checks * Point to new test branch * Reafctor deletion_coordinates to go past the end of the string for deletions at the end of the sequence * Point tests to master * Allow messages to be served in CLI reports * Point to cole/messages test branch * Point tests back to master * point to tests branch * typo * testing github actions * remove test * point tests to master --------- Co-authored-by: Cole Lyman <Cole@colelyman.com> * Update inferred QWC tests to reflect correct intended behavior * Fix inferring QWC to match intended behavior * Add more test cases and fix bug discovered in single bp QWC * Add even more test cases testing indels outside the QWC * Point tests to cole/fix-qwc-deletion * update plotly.js (#138) * Change order of amplicon inference alignment so that 1st amplicon is the reference This makes a difference because it changes the values of `s1inds`, and therefore the value of the inferred quantification window coordinates. * Point integration tests back to master * Update CHANGELOG.md --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> * Update setup.py and pyproject.toml to find all modules and packages * Convert pyproject.toml to Unix file endings --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> * Guardrail dropdowns * Revert changes to .c files * Closed guardrail warning by default * .c fix * Revert .c files to match master Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Revert .c files back to master * Update CHANGELOG.md * Update CHANGELOG.md --------- Co-authored-by: Cole Lyman <Cole@colelyman.com> Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * Run integration tests * Fix pixi env * Remove pro from pixi.toml * Pull branches * Adding Pro token * Add pip install kaleido for pro tests * Pro compare * Pin plotly * Pin kaleido * Fix kaleido pin * Fix pixi plotly pin * Ruff fix * Update integration_tests.yml Remove branch ref for Pro * Run the diffs and print the results in integration tests * Point tests to master --------- Co-authored-by: Cole Lyman <Cole@colelyman.com> Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * feat: add PlotContext dataclass to CRISPResso2 PlotContext is the only part of the plugin architecture that lives in CRISPResso2. It provides a read-only view into CORE's computed data that CRISPRessoPro and plugins consume to generate custom plots. - Zero-copy: wraps references to CORE's local variables - Scope fields (ref_name, sgRNA_ind) are mutable for iteration - HDR vectors and config fields have sensible defaults 6 passing tests. All 591 existing tests unaffected. * feat: add Pro hooks to CRISPRessoCORE and CRISPRessoShared CRISPRessoCORE.py: - Plot hook: After plot pool joins, constructs PlotContext and calls pro_hooks.on_plots_complete() when Pro is installed. Plugin plots run here. Errors handled per halt_on_plot_fail. - Report hook: Routes make_report to Pro when installed, falls back to CRISPRessoReport.make_report when not. CRISPRessoShared.py: - getCRISPRessoArgParser now loads Pro's args.json if CRISPRessoPro is installed, adding --report_config, --exclude_plots, etc. - Refactored arg loading into _add_args_from_dict helper. All 591 existing tests pass (590 + 1 env-dependent skip). Integration tested: Pro report generates correctly with --use_matplotlib. * docs: clarify PlotContext mutability contract Replace 'read-only' framing with an explicit warning that mutation will corrupt results. Python can't enforce immutability on dicts and arrays — frozen=True only prevents field reassignment, not mutation of the underlying data structures. Document it as a contract instead. * Add pro environment * refactor: rename CRISPResso2/plotting to CRISPResso2/plots Matches CRISPRessoPro/plots convention. Updated all imports in CRISPRessoCORE.py, plots/__init__.py, and test_plot_context.py. * feat: gate CORE plot generation through Pro registry When CRISPRessoPro is installed, builds the PlotRegistry before the plot generation block and wraps the plot() function. Each call checks the filename prefix from the args dict against the set of enabled prefixes derived from the registry. Disabled plots are skipped entirely — no rendering, no PNG on disk. The registry is built once, respects --report_config and --exclude_plots/--include_plots, and falls through gracefully if anything fails (all plots run as before). When Pro is NOT installed, plot() is unchanged — all plots generate. Covers all 5 filename root key variants used across the 38 built-in plots: fig_filename_root, fig_root, plot_root, plot_path, piechart_plot_root, barplot_plot_root. * revert: remove plot gating from CRISPRessoCORE The registry/config/prefix-matching logic was Pro implementation leaked into the open-source codebase. The correct approach per the design doc is CRISPRessoPlotData.py — extract data prep into pure functions so Pro can generate its own plots without any plugin machinery in CORE. * feat: extract Batch 1 data prep functions into CRISPRessoPlotData New module CRISPResso2/plots/data_prep.py with 5 pure functions: - prep_read_barplot (plot_1a) - prep_class_piechart_barplot (plot_1b/1c) - prep_allele_homology (plot_1e) - prep_amplicon_modifications (plot_4a) - prep_modification_frequency (plot_4b) All Pattern A (trivial pass-through). CORE now calls these instead of building dicts inline. Output is identical — verified by integration tests with diff against expected results. 10 new unit tests, 600 total passing. * refactor: revert trivial pass-through extractions, keep computed ones Removed prep_read_barplot, prep_class_piechart_barplot, and prep_allele_homology — these were pure packaging with zero reuse value. Pro can build those dicts directly from PlotContext. Kept prep_amplicon_modifications (computes y_max, builds plot_titles) and prep_modification_frequency (builds plot_title with ref_name logic) since they have computation Pro would otherwise duplicate. 5 unit tests, integration verified with diff. * fix: pass custom_config to PlotContext construction Pro data_funcs need custom_config['colors'] to generate plots. * refactor: use inline-snapshot for data_prep tests, pass custom_config to PlotContext * build: add inline-snapshot to test and pro pixi environments * build: ignore N815 lint rule for domain-specific mixed-case class vars * cleanup: empty plots/__init__.py, remove unused re-export * extract prep_indel_size_distribution (plot_3a) 99th percentile clipping of xmin/xmax from histogram data, with ±15 clamping. Also extracts _clip_to_percentile helper. 5 new unit tests, CORE wired to call data_prep function. Integration verified — output identical to expected. * extract prep_frequency_deletions_insertions (plot_3b) Triple 99th percentile clipping for ins/del/sub histograms using hdensity.sum() as the total. Adds total= parameter to _clip_to_percentile helper. 4 new unit tests, CORE wired to call data_prep function. Integration verified — output identical to expected. * refactor: use whole-dict snapshots in data_prep tests Wrap entire result in snapshot() instead of cherry-picking keys. Gives full coverage — any new/changed key auto-updates via --inline-snapshot=fix. Recursive _to_serializable handles numpy. * docs: add data prep extraction TODO list to CLAUDE.md * docs: move data prep TODOs from CLAUDE.md to pi todos * extract prep_dsODN_piechart (plot_1d) * extract prep_nucleotide_quilt (plot_2a) and prep_nucleotide_quilt_around_sgRNA (plot_2b) * extract prep_global_modifications_reference (plot_4e/4f) * extract prep_log_nuc_freqs (plot_10d) * extract prep_conversion_at_sel_nucs (plot_10e/10f/10g) * Run diffs on the plots in integration tests * Correct diff-plots now * Pin matplotlib in test dependencies and update running integration tests * Fix cross-pollution of integration tests * Install MS core fonts in CI for consistent plot rendering CRISPResso2 requests Arial as the primary font via matplotlib's font.sans-serif setting. Without MS core fonts installed, the CI runner falls back to a different font, causing upset plots (and potentially others) to exceed the RMSE image comparison threshold. This matches the Dockerfile which already installs ttf-mscorefonts-installer. Also clears matplotlib's font cache so it discovers the newly installed fonts. * Remove matplotlib pin in tests and don't run diff-plots with Pro * Reorder base edit plots and update caption in Compare * Rebuild pixi fontconfig cache after MS font installation in CI The pixi/conda environment has its own fontconfig with a separate font cache (.pixi/envs/test/var/cache/fontconfig/). When the pixi environment is restored from actions/cache, this fontconfig cache is stale and doesn't know about the MS core fonts installed via apt-get. Running the system fc-cache only updates the system fontconfig, not the pixi one. Fix: run 'pixi run -e test -- fc-cache -f' to rebuild the pixi fontconfig cache so that matplotlib's findSystemFonts() discovers Arial via the pixi fc-list. * Update base editing caption in compare * Sort Compare allele plots by aligned sequence as well * Attempt to fix sorting of compare allele plots * Sort the df being passed into the plot in compare! * Sort the bottom compare plots too! * Make the tiebreaking logic in compare enrichment consistent * Cache MS fonts in pixi * Point to pro plot-plugin branch * Extract prep_hdr_nucleotide_quilt (plot_4g) into data_prep.py Pattern C extraction: moves HDR cross-ref nucleotide quilt DataFrame assembly from CORE into a reusable data_prep function. The function builds multi-batch nuc_pct_df and mod_pct_df across all references with non-zero counts. CORE now calls the extracted function and reads back the DataFrames for CSV writes and plot input. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Extract prep_global_frameshift_data (plot_5a/6a/8a) into data_prep.py Pattern C extraction: moves global frameshift/splice/inframe count aggregation across all coding-seq refs from CORE into a reusable data_prep function. One shared aggregation feeds three plots: 5a (global frameshift pie), 6a (global frameshift profiles), and 8a (global splice sites). HDR special-case logic (adding unmodified reads to non-frameshift bucket) is preserved. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Extract prep_pe_nucleotide_quilt (plot_11a) into data_prep.py Pattern C extraction: moves PE cross-ref nucleotide quilt DataFrame assembly from CORE into a reusable data_prep function. Delegates to prep_hdr_nucleotide_quilt (same computation) and overrides quantification_window_idxs with the first reference's include_idxs (PE amplicons share the same quantification window, unlike HDR). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Extract prep_pe_nucleotide_quilt_around_sgRNA (plot_11b) into data_prep.py Pattern C extraction: moves PE window-slicing logic from CORE into a reusable data_prep function. The computation is identical to prep_nucleotide_quilt_around_sgRNA (plot_2b) but applied to multi-batch PE DataFrames from prep_pe_nucleotide_quilt. Delegates to the existing function to avoid duplication. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix ruff linting * Extract prep_alleles_around_cut (plot_9) into data_prep Pattern D extraction: separates pct adjustment, groupby collapse, and sgRNA interval recomputation from CORE's I/O and plot calls. Window clamping and CRISPRessoShared.get_dataframe_around_cut_asymmetrical stay in CORE; CSV writes and prep_alleles_table stay in CORE. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Extract prep_base_edit_quilt (plot_10h) into data_prep Pattern D extraction: separates pct adjustment, groupby collapse, sgRNA interval recomputation, and x_labels computation from CORE's I/O and plot calls. Similar to prep_alleles_around_cut but uses a symmetric window and computes conversion nucleotide positions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Extract prep_amino_acid_table (plot_9a) into data_prep Pattern D extraction: encapsulates amino acid conversion, cut point calculation, and CRISPRessoShared.get_amino_acid_dataframe call. Uses lazy import of CRISPRessoShared to keep data_prep's top-level imports clean. CSV write stays in CORE. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Audit cleanup: remove dead code, consolidate duplicates, document mutations 1. Delete dead to_numeric_ignore_columns from CRISPRessoCORE.py — all callers were extracted to data_prep.py which has its own copy 2. Fix duplicate test_import_from_module in test_plot_context.py — Python silently overwrote the first with the second 3. Consolidate prep_alleles_around_cut and prep_base_edit_quilt — Extract shared logic into _prep_windowed_alleles helper — Both functions now delegate to the helper, adding only their unique fields (new_cut_point vs x_labels) 4. Document DataFrame mutation contract — Module docstring updated to note the exception — _prep_windowed_alleles, prep_alleles_around_cut, and prep_base_edit_quilt all have .. warning:: blocks — New tests verify mutation identity (returned df IS input df) and in-place column modification behavior * Audit items 5-7: PlotContext nuc summary defaults, HDR vector wiring, remove _jp from data_prep 5. Make nucleotide_frequency_summary and nucleotide_percentage_summary optional in PlotContext (default_factory=dict). These are never populated by CRISPRessoCORE — they're reserved for Batch/Aggregate. Remove the always-{} arguments from the CORE construction call. 6. Wire ref1_all_* vectors into PlotContext construction when expected_hdr_amplicon_seq or prime_editing_pegRNA_extension_seq is set. Previously these fields always defaulted to {} even when the data was available. 7. Remove _jp parameter from prep_global_modifications_reference in data_prep.py. The caller (CORE) now computes plot_root and plot_title directly and passes them in, keeping path logic in CORE where it belongs. * Move Pro plotting to be before core plotting * Add parallel pytest dependency * Add N_READS_INPUT and N_READS_AFTER_PREPROCESSING to alignment_stats * Add the rest of the plots into the elif to be run on CRISPRessoPro * Fix unit tests by moving tests for _to_numeric_ignore_columns * Split up plot_class_piechart_and_barplot into separate functions * Refactor all plot data prep functions to use PlotContext * Minor clean up, rename fig_filename_root to plot_root and add types to plot_context * Fix linting errors * Point integrations tests to master * Point tests to cole/plot-plugin branch * Add a test-pro pixi environment * Add alt_nuc_counts_by_ref to PlotContext * Extract out more data prep functions and move amino_acids_to_numbers * Extract more data_prep functions * Replace plot_root with fig_filename_root * Remove _global_data and move imports * Extract out duplicated code and improve data_prep unit tests * Rename PlotContext to CorePlotContext and add contexts for other modes * Move base editing utilities from CRISPRessoCORE to data_prep; add tests - Move get_refpos_values, get_bp_substitutions, get_upset_plot_counts, get_base_edit_target_sequence, write_base_edit_counts out of CRISPRessoCORE.py into plots/data_prep.py, eliminating the circular import direction (data_prep no longer imports from CORE) - Extract inline plot_10i logic (~50 lines) from CRISPRessoCORE into prep_base_edit_upset; CORE now calls CRISPRessoPlotData.prep_base_edit_upset and passes gap_incentive from the ref dict - Add module-level logger = logging.getLogger(__name__) to data_prep.py following the established codebase pattern; remove the misnamed _base_edit_logger - Move all base editing utility tests from test_CRISPRessoCORE.py into test_data_prep.py as TestGetRefposValues, TestGetBpSubstitutions, TestGetBaseEditTargetSequence, TestGetUpsetPlotCounts, TestWriteBaseEditCounts; remove three duplicate/dead clusters from test_CRISPRessoCORE.py; write_base_edit_counts test now uses tmp_path - Add unit tests for prep_alleles_around_cut (pct recalculation, new_cut_point in/out of sgRNA interval, groupby collapse) and prep_amino_acid_table (plot_input populated above threshold, annotate_wildtype_allele label suffix) - Add sort-order comments in CRISPRessoCompareCORE.py explaining deterministic tiebreaker columns and LFC sort direction - Fix all ruff lint errors (F401 unused imports, E303 blank lines, D413 missing blank line after docstring section) * Pass plot_context to Pro make_report * Clean up some CRISPRessoCORE plotting parameters * cleanup: remove redundant variable unpacking, route plot 10i through plot() wrapper - Remove ref_seq, include_idxs_list, quantification_window_ref_seq, tot_aln_reads, cut_points, sgRNA_orig_sequences, sgRNA_names from per-ref loop top; move CSV-write variables into the crispresso1_mode block where they're used, replace others with refs[ref_name] lookups - Route plot_combination_upset through the plot() multiprocessing wrapper instead of calling it directly - Add design doc for multi-mode plot plugin work (all 5 remaining modes) * feat: Pooled plot plugin — context, data prep, Pro hooks CRISPResso2 side of the Pooled plot plugin: - Add prep_reads_total() and prep_unmod_mod_pcts() to data_prep.py with a prefix parameter for reuse across Pooled/WGS/Aggregate modes - Construct PooledPlotContext in CRISPRessoPooledCORE before plotting - Add if C2PRO_INSTALLED branch that calls pro_hooks.on_pooled_plots_complete() - Refactor elif branch to use prep functions instead of inline args - Add Pro report hook: pro_hooks.make_pooled_report() when Pro installed - Add unit tests for both prep functions including cross-context test with WGSPlotContext * fix: pass plot_context to make_pooled_report for config access * feat: WGS plot plugin — context, data prep, Pro hooks CRISPResso2 side of the WGS plot plugin: - Reuse prep_reads_total() and prep_unmod_mod_pcts() from data_prep.py (shared with Pooled, added to this branch via cherry-pick) - Construct WGSPlotContext in CRISPRessoWGSCORE before plotting - Add if C2PRO_INSTALLED branch calling pro_hooks.on_wgs_plots_complete() - Refactor elif branch to use prep functions - Add Pro report hook: pro_hooks.make_wgs_report() - Preserves WGS-specific metadata keys (reads_summary_plot, modification_summary_plot) for backward compat * feat: Compare plot plugin — context fields, data prep, Pro hooks CRISPResso2 side of the Compare plot plugin: - Add 11 mutable scope fields to ComparePlotContext for per-amplicon loaded data (profile_1/2, mod_freqs_1/2, consensus_sequence, quant_windows_1/2, cut_points, sgRNA_intervals, mod_type, allele_pairs) - Add prep_compare_editing_barchart() — packages amplicon read counts into kwargs for plot_quantification_comparison_barchart - Add prep_compare_modification_positions() — computes Fisher exact tests + Bonferroni correction, returns plot kwargs + mod_df + sig counts - Add prep_compare_allele_table() — merges allele DataFrames, computes LFC, returns top/bottom sorted kwargs for plot_alleles_table_compare - Construct ComparePlotContext in CRISPRessoCompareCORE after loading both run infos - Add if C2PRO_INSTALLED branch calling pro_hooks.on_compare_plots_complete() - Refactor elif branch: CORE loads per-amplicon data onto context scope fields, calls prep functions for all 3 plot types - Add Pro report hook - 13 new unit tests for Compare prep functions including Fisher test correctness, LFC computation, and base_edit detection * feat: Batch plot plugin — context, data prep, Pro hooks CRISPResso2 side of the Batch plot plugin: - Add mod_type scope field to BatchPlotContext - Add 6 prep functions to data_prep.py (typed BatchPlotContext | AggregatePlotContext for reuse) plus _compute_sub_sgRNA_intervals helper - Restructure CRISPRessoBatchCORE: accumulate per-amplicon data into dicts during aggregation loop, construct BatchPlotContext after loop, add Pro hook branch, gate inline plots on not C2PRO_INSTALLED - Pro-only heatmap/line plots disabled in elif branch (moved to hook) - Add Pro report hook - 10 new unit tests for Batch prep functions * feat: Aggregate plot plugin — context, data prep, Pro hooks CRISPResso2 side of the Aggregate plot plugin: - Add mod_type scope field to AggregatePlotContext - Cherry-pick Batch prep functions (prep_batch_nuc_quilt*, prep_batch_allele_modification_*) and Pooled prep functions (prep_reads_total, prep_unmod_mod_pcts) for reuse - Add group_column to prep_batch_nuc_quilt output (auto-detected from DataFrame — 'Batch' or 'Folder') - Two-pass restructure of CRISPRessoAggregateCORE: Pass 1: per-amplicon data aggregation + CSV writes (no plotting) Pass 2: context construction + Pro hook or elif plotting with pagination (slice returned DataFrames per max_samples_per_summary_plot) - Aggregate reads_total/unmod_mod_pcts use min_reads_for_inclusion (different from Pooled's min_reads_to_use_region), kept inline - Pro report hook added - Reuse Batch/Pooled unit tests via cherry-pick (512 total pass) * Fix import in prep_base_edit_upset * Fix base editing tests * Don't pass `--pro` flag, as Pro detection is automatic * Move caption generation into the prep functions * Add pro only tests to CI * Update data_files to be returned from data_prep functions * Fix bugs and more missing data files * Fixes for pro * Extract batch/compare/aggregate elif branches into shared runners The plot plugin architecture splits CRISPResso2 (CORE) and CRISPRessoPro along an if C2PRO_INSTALLED: pro_hooks.on_*_plots_complete() vs elif not args.suppress_plots: boundary. Before this commit, the elif branches contained ~100-130 lines of inline plotting logic per mode, which Pro had no way to invoke — so when Pro was installed, those plots were simply never generated. Fix: extract each mode's elif body into a dedicated runner in the new module CRISPResso2/plots/builtin_runners.py. Both CORE's elif and Pro's on_*_plots_complete hook call the same runner, so the two paths produce the same on-disk output (data files + matplotlib PDFs) without code duplication. Runners added: - run_builtin_batch_plots — nucleotide quilts + conversion maps - run_builtin_aggregate_plots — quilts with pagination across samples - run_builtin_compare_plots — barchart / positions / allele tables + Fisher significance counts published via run_data['_sig_counts'] Also: - Add all_summary_filenames and sub_nucleotide_*_summary_filename fields to BatchPlotContext and AggregatePlotContext so the runners can read all state from the context alone (no positional argument soup). - Add data_prep.write_all_core_data_files and sub-helpers (write_core_amplicon_data_files, write_core_per_sgRNA_data_files, write_core_coding_data_files, write_core_hdr_data_files, write_core_prime_editing_data_files) that emit the CSV data files CORE's elif branch writes as a side-effect. Pro's on_plots_complete hook calls write_all_core_data_files before running plugin plots so Pro mode produces the same Nucleotide_frequency_table.txt, Alleles_frequency_table_around_*.txt, amino_acid_table_*.txt, 10i.*.txt scaffold insertion tables, etc. that non-Pro mode does. - prep_reads_total / prep_unmod_mod_pcts gain an optional name override to preserve Aggregate's historical CRISPRessoAggregate_quantification_of_editing_frequency filename. _resolve_min_reads_cutoff handles the Pooled/WGS (min_reads_to_use_region) vs Aggregate (min_reads_for_inclusion) arg naming split. - CRISPRessoAggregateCORE's late summary plot block now runs unconditionally (CORE matplotlib plots instead of gated on !C2PRO_INSTALLED); a future Pro aggregate hook can replace it with interactive HTML versions. Tests: 33/33 non-Pro + 35/35 Pro integration tests pass, 733 unit tests pass, ruff clean. * Fix Compare tiebreaker sort + add orjson to pro env for deterministic HTML Two CI-only failures that local tests (without --diff-plots) didn't catch: 1. prep_compare_allele_table: the tiebreaker columns (Aligned_Sequence, Reference_Sequence) were being sorted descending when the original CRISPRessoCompareCORE.main() sorted them ascending. That changed which alleles are selected by MAX_N_ROWS (and their order), producing different PDF text streams and failing diff-plots comparisons against the checked-in expected PDFs. Revert to ascending tiebreakers to match the historical behavior. 2. pixi.toml: add orjson to the pro feature's pypi-dependencies. Plotly falls back to Python's json.dumps when orjson is not present, which emits Unicode code points above 0x7F as \uXXXX escape sequences. orjson preserves raw UTF-8, matching the checked-in expected CRISPRessoPooled_*_summary.html fixtures that contain a raw 😀 emoji in a sample name. Local tests already pass without these fixes (CI's environment was the distinguishing factor). * core: inline builtin_runners bodies into CORE elif branches; drop conditional Pro imports Restore the Pro/Core boundary per design_docs/MULTI_MODE_PLOT_PLUGIN.md: CORE and Pro must not share iteration-logic modules. The earlier commit 16f86a2 extracted CORE's elif bodies into a shared CRISPResso2/plots/builtin_runners.py module that both CORE's elif and Pro's hooks imported — violating the mutually-exclusive-paths contract. This commit undoes the extraction on the CORE side, putting the iteration code back inline in each mode's elif branch. Pro's equivalent code lives in CRISPRessoPro/plots/plot_runners.py (a verbatim parallel copy — see preceding CRISPRessoPro commit). The duplication between CORE inline and Pro's plot_runners is intentional: it's the cost the design accepts for the boundary, allowing the two sides to evolve independently. CORE changes: - CRISPRessoBatchCORE.py: inline run_builtin_batch_plots body (~325 lines) into the elif branch. Add missing imports: ProcessPoolExecutor, wait, partial, prep_batch_* functions. Replace the three _should_plot_large(...) calls with should_plot_large_plots(num_rows, False, False), the pre-existing correct function already in scope. This fixes a latent regression where _should_plot_large always returned True (bypassing the sample-count gate — matplotlib could OOM on large batches). - CRISPRessoCompareCORE.py: inline run_builtin_compare_plots body (~250 lines) into the else branch. Add prep_compare_* imports. Compare's runner was synchronous (no ProcessPoolExecutor), so no multiprocessing imports are needed. The Fisher/Bonferroni sig_counts are now local variables in the else branch; the post-branch read-back pulls from crispresso2_info only when C2PRO_INSTALLED (Pro's hook writes them there). - CRISPRessoAggregateCORE.py: inline run_builtin_aggregate_plots body (~390 lines) into the elif branch. Add prep_batch_nuc_quilt imports (prep_reads_total/unmod_mod_pcts were already imported). Also restructure the unconditional summary bar plot block at line 770-791: it is now wrapped in if C2PRO_INSTALLED / elif so CORE emits matplotlib PDFs and Pro emits plotly HTML via the new on_aggregate_summary_ready hook (see CRISPRessoPro commit). Conditional-import cleanup (all 5 CORE modules): - CRISPRessoBatchCORE.py, CRISPRessoCompareCORE.py, CRISPRessoPooledCORE.py, CRISPRessoWGSCORE.py, CRISPRessoAggregateCORE.py: remove the pre-plugin-architecture pattern: if args.use_matplotlib or not is_C2Pro_installed(): from CRISPResso2.plots import CRISPRessoPlot else: from CRISPRessoPro import plot as CRISPRessoPlot Replace with unconditional matplotlib import. In the new architecture, CORE's elif branch is only reached when Pro is not installed, so the else-import of CRISPRessoPro is unreachable dead code. Removing it eliminates CORE's last references to CRISPRessoPro outside the if C2PRO_INSTALLED: pro_hooks.* blocks. Subtle consequence in Aggregate: the formerly-unconditional summary bar plot block (previously resolved to Pro.plot via the conditional) now always uses matplotlib. The new on_aggregate_summary_ready Pro hook restores HTML output for Pro mode. Delete CRISPResso2/plots/builtin_runners.py: Nothing imports from it after this commit. Verified with: grep -rn 'from CRISPResso2.plots.builtin_runners' . → empty grep -rn 'from CRISPRessoPro' CRISPResso2/ → only inside 'if C2PRO_INSTALLED:' blocks (hook delegation, legitimate) Tests: 33/33 non-Pro + 35/35 Pro integration tests pass locally, 734 CRISPResso2 unit tests pass, 175 CRISPRessoPro unit tests pass, ruff clean. Note: the +554 LoC in CORE is pure code movement from the deleted builtin_runners.py file (net neutral). Review strategy: git diff 16f86a2~1 HEAD -- CRISPRessoBatchCORE.py to compare against the pre-extraction state. * data_prep: sanitize sample names in summary plot kwargs; drop orjson Make plot output deterministic across environments by sanitizing the 'Name' column of df_summary_quantification via CRISPRessoShared.slugify before it's embedded into plot kwargs. Previously, emoji and other non-ASCII characters in sample names produced environment-dependent HTML: plotly's orjson backend emits raw UTF-8 while the stdlib json backend emits \uXXXX escapes, causing integration tests to flap between local (has orjson) and CI (doesn't). Changes to data_prep.py: - New _sanitize_summary_names(df) helper — copies the DataFrame and applies slugify(str(name)) to the 'Name' column. The source df is not mutated, so CSVs, log messages, and crispresso2_info keys still hold the raw user-supplied names. Only the plot kwargs see the sanitized form. Missing Name column, non-string entries (int, None, NaN), and empty DataFrames are all handled without crashing. - prep_reads_total / prep_unmod_mod_pcts both wrap ctx.df_summary_quantification with _sanitize_summary_names() before returning. Tests added (tests/unit_tests/test_data_prep.py): - TestSanitizeSummaryNames class: 7 new tests covering ASCII pass-through, emoji stripping, source-not-mutated, non-string coercion, missing column, empty df, and propagation to prep_unmod_mod_pcts. - TestPrepReadsTotal.test_dataframe_is_a_copy — updated from the previous 'is same reference' assertion which is no longer true. pixi.toml changes: - Drop orjson from [feature.pro.pypi-dependencies] — no longer needed once sanitization keeps output pure-ASCII. (orjson is still present transitively as a kaleido dependency in local envs, but it's not a correctness requirement anymore.) - Pin kaleido==0.2.1 — kaleido 1.x requires plotly >= 6.1.1 but the project is still on plotly 5.18.0. Without the pin, pixi was auto-upgrading to kaleido 1.2.0 and static image export crashed. This matches the manual pip install in CI (pip install kaleido==0.2.1). Tests: 33/33 non-Pro + 35/35 Pro integration tests pass locally. 741 CRISPResso2 unit tests pass (up from 734; +7 new sanitization tests). Ruff clean. * Update n_processes and halt_on_plot_fail * config: add Pro-only --config_json parsing and strict validation * Fix how pixi manages environments, clean up the delination with pro and testing (#168) * Fix how pixi manages environments, clean up the delination with pro and testing * Fix how pixi environment is specified * Fix when the repos are checked out * Move CRISPRessoPro to sibling directory * fix pixi * Reduce docker size (#170) * Limit copying the pixi environment * docker: prune pixi runtime env and run without pixi * pixi: use matplotlib-base and drop default compiler deps * pixi: move plotly from default to pro feature * Cache *.pyc files in Docker image * Add `Aligned_Sequence` to compare output (#171) * Add Aligned_Sequence to CRISPRessoCompare output The index was set to None, which led to the Aligned_Sequence not being output * Point to updated test branch * Remove --cov from unit tests * Point tests back to master * ruff fix * empty commit * Attempt to fix CRISPRessoCompare test * Add Aligned_Sequence to Compare output * point tests to cole/fresh-plot-plugin * point tests to cole/plot-plugin * empty commit * empty commit * empty commit * empty commit * empty commit * remove index=None * add back index=None * make to_csv() args explicit * empty commit * empty commit * empty commit * empty * empty * empty * empty * empty * When suppress_plots is True, write the data CRISPResso data files * Fix bug where last base of sequence isn't plotted * Simplify calling pro hooks * Quote zipped results better and simplify is_C2Pro_installed * Fix quantification window at reference end (#174) * Move CRISPRessoShared import to shared global in data_prep.py * Move find_closest_sgRNA_cut_point function from CORE into data_prep * Move get_pe_scaffold_search into data_prep * Add logger in data_prep * remove comment * remove comment * remove comment * remove old design doc * remove README changes * remove n_this_category_modified * extract non-pro path to data_prep.write_core_amplicon_data_files() * empty * change key to "figures" * point branch checkouts to master --------- Co-authored-by: Samuel Nichols <Snic9004@gmail.com> Co-authored-by: Cole Lyman <Cole@colelyman.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR extracts the data preparation for the CRISPResso plots into individual functions, so that they can be called independently, and make the plot plugins in CRISPRessoPro possible.