Releases · ModelCloud/Tokenicer

09 Feb 08:27

Qubitium

v0.0.6

92bdf47

Tokenicer v0.0.6 Latest

Latest

What's Changed

[FIX] avoid proxying call to inner tokenizer (ChatGLMTokenizer compatibility) by @ZX-ModelCloud in #42

New Contributors

@ZX-ModelCloud made their first contribution in #42

Full Changelog: v0.0.5...v0.0.6

Contributors

ZX-ModelCloud

Assets 2

04 Sep 05:11

Qubitium

v0.0.5

63e3008

Toke(n)icer v0.0.5

What's Changed

Fix pad_token for longcat by @LRL-ModelCloud in #36
Update test_pad_token.py by @LRL-ModelCloud in #38
[CI] user docker image by @CSY-ModelCloud in #40
[CI] add gemma3 test by @CL-ModelCloud in #35

New Contributors

@LRL-ModelCloud made their first contribution in #36

Full Changelog: v0.0.4...v0.0.5

Contributors

LRL-ModelCloud, CL-ModelCloud, and CSY-ModelCloud

Assets 2

21 Feb 09:36

Qubitium

v0.0.4

dd95bdf

Toke(n)icer v0.0.4

What's Changed

⚡ Now tokenicer instance dynamically inherits the native tokenizer.__class__ of tokenizer passed in or loaded via our Tokenicer.load() api.
⚡ CI now tests tokenizers from 64 models

fix mpt pad token bug by @CL-ModelCloud in #24
fix model_config bugs by @CL-ModelCloud in #25
test code clean up by @CL-ModelCloud in #26
Inherits PretrainedTokenizer by @Qubitium in #28
loop & test all models by @CSY-ModelCloud in #30

Full Changelog: v0.0.2...v0.0.4

Contributors

Qubitium, CL-ModelCloud, and CSY-ModelCloud

Assets 2

21 Feb 07:18

Qubitium

v0.0.3

b0b2591

Toke(n)icer v0.0.3

What's Changed

Now tokenicer instance dynamically inherits the native tokenizer.__class__ of tokenizer passed in or loaded via our Tokenicer.load() api.

fix mpt pad token bug by @CL-ModelCloud in #24
fix model_config bugs by @CL-ModelCloud in #25
test code clean up by @CL-ModelCloud in #26
Inherits PretrainedTokenizer by @Qubitium in #28

Full Changelog: v0.0.2...v0.0.3

Contributors

Qubitium and CL-ModelCloud

Assets 2

10 Feb 13:41

Qubitium

v0.0.2

efc81a2

Toke(n)icer v0.0.2

What's Changed

⚡ Auto-fix models not setting padding_token
⚡ Auto-Fix models released with wrong padding_token: many models incorrectly use eos_token as pad_token which leads to subtle and hidden errors in post-training and inference when batching is used which is almost always.
⚡ Compatible with all HF Transformers recognized tokenizers

Auto fix pad token by @CL-ModelCloud in #5
Forward to Tokenizer by @CL-ModelCloud in #6
read requirements.txt in setup.py by @CSY-ModelCloud in #7
[CI] add tokenicer forward test by @CL-ModelCloud in #10
add unit tests by @CSY-ModelCloud in #11
refractor by @Qubitium in #8
add deepseek_v3 map by @CL-ModelCloud in #15

New Contributors

@CSY-ModelCloud made their first contribution in #1
@Qubitium made their first contribution in #3
@CL-ModelCloud made their first contribution in #5

Full Changelog: https://github.com/ModelCloud/Tokenicer/commits/v0.0.2

Contributors

Qubitium, CL-ModelCloud, and CSY-ModelCloud

Assets 2

Releases: ModelCloud/Tokenicer

Tokenicer v0.0.6

What's Changed

New Contributors

Contributors

Uh oh!

Toke(n)icer v0.0.5

What's Changed

New Contributors

Contributors

Uh oh!

Toke(n)icer v0.0.4

What's Changed

Contributors

Uh oh!

Toke(n)icer v0.0.3

What's Changed

Contributors

Uh oh!

Toke(n)icer v0.0.2

What's Changed

New Contributors

Contributors

Uh oh!