Releases: ModelCloud/Tokenicer
Tokenicer v0.0.6
What's Changed
- [FIX] avoid proxying call to inner tokenizer (ChatGLMTokenizer compatibility) by @ZX-ModelCloud in #42
New Contributors
- @ZX-ModelCloud made their first contribution in #42
Full Changelog: v0.0.5...v0.0.6
Toke(n)icer v0.0.5
What's Changed
- Fix pad_token for longcat by @LRL-ModelCloud in #36
- Update test_pad_token.py by @LRL-ModelCloud in #38
- [CI] user docker image by @CSY-ModelCloud in #40
- [CI] add gemma3 test by @CL-ModelCloud in #35
New Contributors
- @LRL-ModelCloud made their first contribution in #36
Full Changelog: v0.0.4...v0.0.5
Toke(n)icer v0.0.4
What's Changed
⚡ Now tokenicer instance dynamically inherits the native tokenizer.__class__ of tokenizer passed in or loaded via our Tokenicer.load() api.
⚡ CI now tests tokenizers from 64 models
- fix mpt pad token bug by @CL-ModelCloud in #24
- fix model_config bugs by @CL-ModelCloud in #25
- test code clean up by @CL-ModelCloud in #26
- Inherits PretrainedTokenizer by @Qubitium in #28
- loop & test all models by @CSY-ModelCloud in #30
Full Changelog: v0.0.2...v0.0.4
Toke(n)icer v0.0.3
What's Changed
Now tokenicer instance dynamically inherits the native tokenizer.__class__ of tokenizer passed in or loaded via our Tokenicer.load() api.
- fix mpt pad token bug by @CL-ModelCloud in #24
- fix model_config bugs by @CL-ModelCloud in #25
- test code clean up by @CL-ModelCloud in #26
- Inherits PretrainedTokenizer by @Qubitium in #28
Full Changelog: v0.0.2...v0.0.3
Toke(n)icer v0.0.2
What's Changed
⚡ Auto-fix models not setting padding_token
⚡ Auto-Fix models released with wrong padding_token: many models incorrectly use eos_token as pad_token which leads to subtle and hidden errors in post-training and inference when batching is used which is almost always.
⚡ Compatible with all HF Transformers recognized tokenizers
- Auto fix pad token by @CL-ModelCloud in #5
- Forward to Tokenizer by @CL-ModelCloud in #6
- read requirements.txt in setup.py by @CSY-ModelCloud in #7
- [CI] add tokenicer forward test by @CL-ModelCloud in #10
- add unit tests by @CSY-ModelCloud in #11
- refractor by @Qubitium in #8
- add deepseek_v3 map by @CL-ModelCloud in #15
New Contributors
- @CSY-ModelCloud made their first contribution in #1
- @Qubitium made their first contribution in #3
- @CL-ModelCloud made their first contribution in #5
Full Changelog: https://github.com/ModelCloud/Tokenicer/commits/v0.0.2