Skip to content

feat: add CLI command for PSM table assembly from Winnow output #197

@BioGeek

Description

@BioGeek

Context

The INFlow pipeline (nf-core/denovoproteomics) has an ASSEMBLE_PSM_TABLE step that merges per-sample Winnow outputs into a single experiment-level PSM table. This is currently implemented as an inline Python script in the Nextflow module (modules/local/assemble_psm_table/main.nf).

Since the inputs are Winnow's own outputs, this should be a Winnow CLI command (e.g. winnow assemble) rather than pipeline-embedded logic.

What the inline script does

  1. Reads all per-sample *_preds_and_fdr_metrics.csv files (Winnow FDR output)
  2. Reads all per-sample *_metadata.csv files (Winnow metadata output)
  3. Merges each pair on spectrum_id (inner join)
  4. Extracts experiment_file from spectrum_id (format filename:scan) or uses experiment_name column
  5. Concatenates all samples into one experiment-level table
  6. Drops Winnow internal feature columns that contain serialised arrays (e.g. mz_array, intensity_array, token_log_probs, theoretical_mz, etc.) — these break downstream TSV consumers
  7. Writes psms.tsv (tab-separated)

Proposed CLI

winnow assemble \
    --fdr-results sample1_preds_and_fdr_metrics.csv sample2_preds_and_fdr_metrics.csv \
    --metadata sample1_metadata.csv sample2_metadata.csv \
    --output psms.tsv

Or with directory input:

winnow assemble \
    --input-dir results/ \
    --output psms.tsv

Acceptance criteria

  • CLI command that takes Winnow FDR + metadata CSVs and produces a merged PSM table
  • Drops serialised array columns by default (with --keep-arrays flag to override)
  • Tab-separated output by default
  • Handles missing experiment_name column gracefully (falls back to spectrum_id prefix)
  • Works with both single-sample and multi-sample inputs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions