-
Notifications
You must be signed in to change notification settings - Fork 794
Issues: huggingface/tokenizers
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Allow users to select/write encoding strategies
Feature Request
#1655
opened Oct 16, 2024 by
pietrolesci
Inconsistent behaviour of Something isn't working
PreTrainedTokenizerFast
s on diacritics marked texts
bug
#1663
opened Oct 11, 2024 by
sven-nm
2 of 4 tasks
Disable pretty-print when saving tokenizer.json files
Feature Request
#1656
opened Oct 7, 2024 by
xenova
How to build a custom tokenizer on top of a exsiting Llama 3.2 tokenizer?
training
#1644
opened Oct 5, 2024 by
yakhyo
NormalizedString.clear() broken?
bug
Something isn't working
#1636
opened Sep 25, 2024 by
lkurlandski
Adding many AddedTokens makes loading a tokenizer extremely slow.
#1635
opened Sep 25, 2024 by
stephantul
Rust: How to handle models with
precompiled_charsmap = null
Feature Request
#1627
opened Sep 4, 2024 by
kallebysantos
Special token gets tokenized while training tokenizer from scratch
#1624
opened Sep 2, 2024 by
LalchandPandia
ModuleNotFoundError: No module named 'tokenizers.tokenizers'
#1619
opened Aug 25, 2024 by
jpferraro1
Space after unnormalized token is added when
use_fast=True
for Llama tokenizer
#1613
opened Aug 14, 2024 by
Butanium
Support for Golang now or support a cli for other languages?
#1601
opened Aug 7, 2024 by
xuxiaoxia96
Previous Next
ProTip!
Updated in the last three days: updated:>2024-10-24.