NLP toolkit for 20+ Turkic languages — a pip-installable open-source Python library with adaptations for the low-resource, morphologically rich Turkic language family.
Maintained by Sherzod Hakimov
If you use TurkicNLP in your research, please cite:
@misc{hakimov2026turkicnlpnlptoolkit,
title={TurkicNLP: An NLP Toolkit for Turkic Languages},
author={Sherzod Hakimov},
year={2026},
eprint={2602.19174},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2602.19174},
}- 24 Turkic languages from Turkish to Sakha, Kazakh to Uyghur
- Script-aware from the ground up — Latin, Cyrillic, Perso-Arabic, Old Turkic Runic
- Automatic script detection and bidirectional transliteration
- Morphology analyser for ~20 Turkic languages
- Universal dependencies integration — pretrained tokenization, POS tagging, lemmatization, dependency parsing, and NER
- Pretrained embeddings + translation backend — get vectors for sentences and translate across many languages
- License - Apache-2.0

Geographic distribution of Turkic languages (source: Wikimedia Commons)
https://github.com/turkic-nlp/turkicnlp
https://github.com/turkic-nlp/turkic-nlp-code-samples
pip install turkicnlpTo install all required dependencies at once:
pip install "turkicnlp[all]"With optional dependencies:
pip install "turkicnlp[stanza]" # Stanza/UD neural models
pip install "turkicnlp[all]" # Everything: stanza, NLLB embeddings & translations
pip install "turkicnlp[dev]" # Development tools