Context
The INFlow pipeline (nf-core/denovoproteomics) has an ASSEMBLE_PSM_TABLE step that merges per-sample Winnow outputs into a single experiment-level PSM table. This is currently implemented as an inline Python script in the Nextflow module (modules/local/assemble_psm_table/main.nf).
Since the inputs are Winnow's own outputs, this should be a Winnow CLI command (e.g. winnow assemble) rather than pipeline-embedded logic.
What the inline script does
- Reads all per-sample
*_preds_and_fdr_metrics.csv files (Winnow FDR output)
- Reads all per-sample
*_metadata.csv files (Winnow metadata output)
- Merges each pair on
spectrum_id (inner join)
- Extracts
experiment_file from spectrum_id (format filename:scan) or uses experiment_name column
- Concatenates all samples into one experiment-level table
- Drops Winnow internal feature columns that contain serialised arrays (e.g.
mz_array, intensity_array, token_log_probs, theoretical_mz, etc.) — these break downstream TSV consumers
- Writes
psms.tsv (tab-separated)
Proposed CLI
winnow assemble \
--fdr-results sample1_preds_and_fdr_metrics.csv sample2_preds_and_fdr_metrics.csv \
--metadata sample1_metadata.csv sample2_metadata.csv \
--output psms.tsv
Or with directory input:
winnow assemble \
--input-dir results/ \
--output psms.tsv
Acceptance criteria
Context
The INFlow pipeline (nf-core/denovoproteomics) has an
ASSEMBLE_PSM_TABLEstep that merges per-sample Winnow outputs into a single experiment-level PSM table. This is currently implemented as an inline Python script in the Nextflow module (modules/local/assemble_psm_table/main.nf).Since the inputs are Winnow's own outputs, this should be a Winnow CLI command (e.g.
winnow assemble) rather than pipeline-embedded logic.What the inline script does
*_preds_and_fdr_metrics.csvfiles (Winnow FDR output)*_metadata.csvfiles (Winnow metadata output)spectrum_id(inner join)experiment_filefromspectrum_id(formatfilename:scan) or usesexperiment_namecolumnmz_array,intensity_array,token_log_probs,theoretical_mz, etc.) — these break downstream TSV consumerspsms.tsv(tab-separated)Proposed CLI
winnow assemble \ --fdr-results sample1_preds_and_fdr_metrics.csv sample2_preds_and_fdr_metrics.csv \ --metadata sample1_metadata.csv sample2_metadata.csv \ --output psms.tsvOr with directory input:
winnow assemble \ --input-dir results/ \ --output psms.tsvAcceptance criteria
--keep-arraysflag to override)experiment_namecolumn gracefully (falls back tospectrum_idprefix)