Skip to content

Pull requests: huggingface/tokenizers

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

feat(NFC): skip Unicode pass for all-ASCII inputs
#2037 opened Apr 26, 2026 by KimBioInfoStudio Loading…
2 of 3 tasks
feat: SIMD ASCII fast path for Lowercase normalizer (~30-49x)
#2036 opened Apr 26, 2026 by KimBioInfoStudio Loading…
5 of 6 tasks
V0.23 release
#2032 opened Apr 24, 2026 by ArthurZucker Collaborator Loading…
Batch encode: lock-free work queue with dynamic window sizing
#2029 opened Apr 23, 2026 by sebpop Contributor Loading…
perf: skip alignment tracking in encode_fast normalization
#2022 opened Apr 10, 2026 by ArthurZucker Collaborator Loading…
Reduce crate size
#2015 opened Apr 9, 2026 by ArthurZucker Collaborator Loading…
node: bump version to 0.22.2 for release
#2009 opened Apr 4, 2026 by MayCXC Contributor Loading…
feat(pattern): parallel regex find_matches for large inputs
#2003 opened Mar 31, 2026 by McPatate Member Loading…
fix: skip serializing ByteLevel fields at their default value
#2001 opened Mar 30, 2026 by ArthurZucker Collaborator Loading…
Regex split parity
#1991 opened Mar 27, 2026 by ArthurZucker Collaborator Loading…
feat: add new faster whitespace split pretok
#1985 opened Mar 26, 2026 by McPatate Member Loading…
Implementing Parity-aware BPE
#1974 opened Mar 21, 2026 by cimeister Loading…
feat: add pcre2 as optional feature
#1959 opened Mar 2, 2026 by wheynelau Contributor Loading…
Add get_special_tokens and is_special_token methods
#1945 opened Feb 5, 2026 by ArthurZucker Collaborator Loading…
2 tasks done
Add post_process_tokens and post_process_ids methods
#1944 opened Feb 5, 2026 by ArthurZucker Collaborator Loading…
3 tasks done
ProTip! no:milestone will show everything without a milestone.