Skip to content
@turkic-nlp

TurkicNLP

NLP Toolkit for Turkic Languages

NLP toolkit for 20+ Turkic languages — a pip-installable open-source Python library with adaptations for the low-resource, morphologically rich Turkic language family.

Maintained by Sherzod Hakimov

License: Apache-2.0 Python 3.9+ Status: Pre-Alpha 24 Turkic Languages

Citation

If you use TurkicNLP in your research, please cite:

@misc{hakimov2026turkicnlpnlptoolkit,
      title={TurkicNLP: An NLP Toolkit for Turkic Languages}, 
      author={Sherzod Hakimov},
      year={2026},
      eprint={2602.19174},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.19174}, 
}

Features

  • 24 Turkic languages from Turkish to Sakha, Kazakh to Uyghur
  • Script-aware from the ground up — Latin, Cyrillic, Perso-Arabic, Old Turkic Runic
  • Automatic script detection and bidirectional transliteration
  • Morphology analyser for ~20 Turkic languages
  • Universal dependencies integration — pretrained tokenization, POS tagging, lemmatization, dependency parsing, and NER
  • Pretrained embeddings + translation backend — get vectors for sentences and translate across many languages
  • License - Apache-2.0

Supported Languages and Components

Distribution map of Turkic languages
Geographic distribution of Turkic languages (source: Wikimedia Commons)

Open-source Library

https://github.com/turkic-nlp/turkicnlp

Code Samples

https://github.com/turkic-nlp/turkic-nlp-code-samples

Installation

pip install turkicnlp

To install all required dependencies at once:

pip install "turkicnlp[all]"

With optional dependencies:

pip install "turkicnlp[stanza]"        # Stanza/UD neural models
pip install "turkicnlp[all]"           # Everything: stanza, NLLB embeddings & translations
pip install "turkicnlp[dev]"           # Development tools

More info

https://turkic-nlp.github.io/

Popular repositories Loading

  1. turkicnlp turkicnlp Public

    NLP Toolkit for Turkic Languages

    Python 2

  2. turkic-nlp.github.io turkic-nlp.github.io Public

    A web page that shows available resources for Turkic languages.

    HTML

  3. .github .github Public

  4. apertium-data apertium-data Public

    Python

  5. trained-stanza-models trained-stanza-models Public

Repositories

Showing 5 of 5 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…