Add a fuzzy matching parameter (--max-barcode-errors) to the demux algorithm for higher barcode demux classifications by uzbit · Pull Request #1580 · nanoporetech/dorado

uzbit · 2026-03-13T21:35:51Z

It was pointed out to me that the default demux for dorado is not classifying many reads due to exact matching? Perhaps I'm missing something, but it seems from the help docs in the application, there aren't any flags to modify demux parameters. This PR adds --max-barcode-errors N to allow for fuzzy barcode matching up to N errors using edit distance (indel+subs). See below for test results. Note particularly with my test data, and max-barcode-errors=3 we get 84.3% classified into barcodes, vs the 4.9% classified using default parameters.

DEMUX DEFAULT v1.4:

../../../dorado/build/bin/dorado demux --kit-name SQK-NBD114-96 -v --emit-fastq --emit-summary --no-trim -o demuxed_reads_default/ calls_v1.4.bam
[2026-03-13 15:21:34.337] [info] Running: "demux" "--kit-name" "SQK-NBD114-96" "-v" "--emit-fastq" "--emit-summary" "--no-trim" "-o" "demuxed_reads_default/" "calls_v1.4.bam"
[2026-03-13 15:21:34.338] [info] num input files: 1
[2026-03-13 15:21:34.338] [debug] > barcoding threads 15, writer threads 1
[2026-03-13 15:21:34.338] [info] - Note: FASTQ output is not recommended as not all data can be preserved.
[2026-03-13 15:21:34.338] [debug] Creating output folder: 'demuxed_reads_default'. Length:44
[2026-03-13 15:21:34.350] [info] > starting barcode demuxing
[2026-03-13 15:21:34.351] [debug] > Kits to evaluate: 1
[2026-03-13 15:21:35.342] [debug] Processed 50000 reads
[2026-03-13 15:21:36.792] [debug] Processed 100000 reads
[2026-03-13 15:21:38.228] [debug] Processed 150000 reads
[2026-03-13 15:21:39.676] [debug] Processed 200000 reads
[2026-03-13 15:21:41.073] [debug] Processed 250000 reads
[2026-03-13 15:21:42.552] [debug] Processed 300000 reads
[2026-03-13 15:21:43.967] [debug] Processed 350000 reads
[2026-03-13 15:21:45.405] [debug] Processed 400000 reads
[2026-03-13 15:21:46.817] [debug] Processed 450000 reads
[2026-03-13 15:21:48.249] [debug] Processed 500000 reads
[2026-03-13 15:21:49.708] [debug] Processed 550000 reads
[2026-03-13 15:21:51.205] [debug] Processed 600000 reads
[2026-03-13 15:21:52.651] [debug] Processed 650000 reads
[2026-03-13 15:21:54.083] [debug] Processed 700000 reads
[2026-03-13 15:21:55.570] [debug] Processed 750000 reads
[2026-03-13 15:21:57.026] [debug] Processed 800000 reads
[2026-03-13 15:21:58.485] [debug] Processed 850000 reads
[2026-03-13 15:21:59.739] [debug] Total reads processed: 892590
[2026-03-13 15:22:00.191] [info] > Finished in (ms): 25853
[2026-03-13 15:22:00.191] [info] > Reads written: 892590
[2026-03-13 15:22:00.191] [info] > 892590 reads demuxed @ classifications/s: 3.452559e+04
[2026-03-13 15:22:00.191] [debug] Barcode distribution :
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode23 : 1
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode37 : 1
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode41 : 9375
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode42 : 1130
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode43 : 362
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode44 : 7525
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode45 : 73
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode46 : 212
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode47 : 95
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode48 : 7
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode49 : 78
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode50 : 127
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode51 : 3837
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode52 : 3646
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode53 : 1032
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode54 : 185
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode55 : 355
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode56 : 2996
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode57 : 2987
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode58 : 29
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode59 : 4338
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode60 : 95
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode61 : 3549
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode62 : 146
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode63 : 1399
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode64 : 76
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode65 : 1
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode66 : 1
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode67 : 2
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode69 : 1
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode79 : 1
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode84 : 4
[2026-03-13 15:22:00.191] [debug] SQK-NBD114-96_barcode90 : 6
[2026-03-13 15:22:00.191] [debug] unclassified : 848918
[2026-03-13 15:22:00.191] [debug] Classified rate 4.8927307%
[2026-03-13 15:22:00.191] [info] > finished barcode demuxing

NEW FUZZY BARCODE MATCH v1.4:

../../../dorado/build/bin/dorado demux --kit-name SQK-NBD114-96 -v --emit-fastq --emit-summary --no-trim --max-barcode-errors 3 -o demuxed_reads_fuzzy/ calls_v1.4.bam
[2026-03-13 15:27:48.892] [info] Running: "demux" "--kit-name" "SQK-NBD114-96" "-v" "--emit-fastq" "--emit-summary" "--no-trim" "--max-barcode-errors" "3" "-o" "demuxed_reads_fuzzy/" "calls_v1.4.bam"
[2026-03-13 15:27:48.893] [info] num input files: 1
[2026-03-13 15:27:48.893] [debug] > barcoding threads 15, writer threads 1
[2026-03-13 15:27:48.893] [info] - Note: FASTQ output is not recommended as not all data can be preserved.
[2026-03-13 15:27:48.893] [debug] Creating output folder: 'demuxed_reads_fuzzy'. Length:42
[2026-03-13 15:27:48.905] [info] > starting barcode demuxing
[2026-03-13 15:27:48.906] [debug] > Kits to evaluate: 1
[2026-03-13 15:27:53.338] [debug] Processed 50000 reads
[2026-03-13 15:28:00.084] [debug] Processed 100000 reads
[2026-03-13 15:28:06.741] [debug] Processed 150000 reads
[2026-03-13 15:28:13.573] [debug] Processed 200000 reads
[2026-03-13 15:28:20.468] [debug] Processed 250000 reads
[2026-03-13 15:28:27.474] [debug] Processed 300000 reads
[2026-03-13 15:28:34.160] [debug] Processed 350000 reads
[2026-03-13 15:28:40.945] [debug] Processed 400000 reads
[2026-03-13 15:28:47.783] [debug] Processed 450000 reads
[2026-03-13 15:28:54.782] [debug] Processed 500000 reads
[2026-03-13 15:29:01.908] [debug] Processed 550000 reads
[2026-03-13 15:29:09.121] [debug] Processed 600000 reads
[2026-03-13 15:29:16.343] [debug] Processed 650000 reads
[2026-03-13 15:29:23.593] [debug] Processed 700000 reads
[2026-03-13 15:29:30.727] [debug] Processed 750000 reads
[2026-03-13 15:29:37.827] [debug] Processed 800000 reads
[2026-03-13 15:29:45.007] [debug] Processed 850000 reads
[2026-03-13 15:29:51.206] [debug] Total reads processed: 892590
[2026-03-13 15:29:53.392] [info] > Finished in (ms): 124499
[2026-03-13 15:29:53.392] [info] > Reads written: 892590
[2026-03-13 15:29:53.392] [info] > 892590 reads demuxed @ classifications/s: 7.169455e+03
[2026-03-13 15:29:53.392] [debug] Barcode distribution :
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode12 : 1
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode29 : 1
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode41 : 53149
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode42 : 29729
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode43 : 6985
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode44 : 39717
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode45 : 6559
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode46 : 3361
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode47 : 781
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode48 : 6369
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode49 : 19349
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode50 : 14554
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode51 : 39214
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode52 : 46481
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode53 : 17797
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode54 : 11315
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode55 : 94278
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode56 : 58064
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode57 : 29393
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode58 : 24865
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode59 : 42423
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode60 : 43877
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode61 : 54342
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode62 : 75994
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode63 : 17570
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode64 : 16580
[2026-03-13 15:29:53.392] [debug] SQK-NBD114-96_barcode65 : 2
[2026-03-13 15:29:53.393] [debug] SQK-NBD114-96_barcode66 : 5
[2026-03-13 15:29:53.393] [debug] SQK-NBD114-96_barcode67 : 2
[2026-03-13 15:29:53.393] [debug] SQK-NBD114-96_barcode68 : 5
[2026-03-13 15:29:53.393] [debug] SQK-NBD114-96_barcode81 : 1
[2026-03-13 15:29:53.393] [debug] unclassified : 139827
[2026-03-13 15:29:53.393] [debug] Classified rate 84.33469%
[2026-03-13 15:29:53.393] [info] > finished barcode demuxing

palakpsheth · 2026-03-13T21:40:54Z

@uzbit love this

malton-ont · 2026-03-19T11:36:10Z

Hi @uzbit,

Could you explain what issue this PR is attempting to resolve? If you are having issues with classification rates I would recommend opening an issue so we can take a look at your data in the first instance - there may be valid reasons why dorado is refusing to classify these reads.

It was pointed out to me that the default demux for dorado is not classifying many reads due to exact matching

This is incorrect - dorado does not require exact matching of barcodes. It already allows a number of mismatches (see here where we check the barcode penalty against m_scoring_params.max_barcode_penalty).

there aren't any flags to modify demux parameters

This is true, but scoring parameters can be overridden by creating a custom barcode configuration - see the documentation here. Note that this needs to be a full custom configuration, including the barcode sequences. We're looking at making this easier in a future release.

Reagarding the PR itself, your barcode_fuzzy function is skipping a number of checks that the standard barcoding requires - barcode proximity to the ends of the read, flank scores, midstrand barcode detection (in fact, it seems to directly permit and classify based on midstrand barcodes?). It is also not suitable for double ended barcode kits, but does not make any guards against using these.

uzbit · 2026-03-25T16:08:11Z

Could you explain what issue this PR is attempting to resolve? If you are having issues with classification rates I would recommend opening an issue so we can take a look at your data in the first instance - there may be valid reasons why dorado is refusing to classify these reads.

I will verify I can share this dataset with you before opening an issue. Thanks for the offer and I'll get back to you about this soon.

Reagarding the PR itself, your barcode_fuzzy function is skipping a number of checks that the standard barcoding requires - barcode proximity to the ends of the read, flank scores, midstrand barcode detection (in fact, it seems to directly permit and classify based on midstrand barcodes?). It is also not suitable for double ended barcode kits, but does not make any guards against using these.

Granted this function does loosen up all of the restrictions on matching barcodes to sequences, and this is the point. It is intended to be a diagnostic tool to look at a given dataset. It seems at least from my dataset, the default parameters are too strict. I understand if you'd like to close this and look forward to when you add user modifiable demuxing parameters.

MarkBicknellONT and others added 2 commits February 19, 2026 19:47

Version 1.4 bump

ba44a01

Adding a fuzzy matching parameter to use when demuxing reads

c3048d0

fix version info for master

02aff13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a fuzzy matching parameter (--max-barcode-errors) to the demux algorithm for higher barcode demux classifications#1580

Add a fuzzy matching parameter (--max-barcode-errors) to the demux algorithm for higher barcode demux classifications#1580
uzbit wants to merge 3 commits intonanoporetech:masterfrom
uzbit:feat_demuxfuzz

uzbit commented Mar 13, 2026

Uh oh!

palakpsheth commented Mar 13, 2026

Uh oh!

malton-ont commented Mar 19, 2026

Uh oh!

uzbit commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

uzbit commented Mar 13, 2026

Uh oh!

palakpsheth commented Mar 13, 2026

Uh oh!

malton-ont commented Mar 19, 2026

Uh oh!

uzbit commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants