A basic implementation of a trigram language model built from scratch using Python and NumPy.
The project trains on a text corpus, builds unigram/bigram/trigram statistics, and supports:
- Sentence generation (
eval.py) - Next word prediction (
predict_one.py) - REST API serving with FastAPI (
api.py)
Train in Jupyter Notebook:
jupyter notebook notebooks/3gram_model.ipynbRun evaluation / prediction:
python3 models/eval.py the quick 20
python3 models/predict_one.py the quickStart API:
uvicorn api:app --reload --port 8000Example Output:
Input: the quick
Generated: the quick brown fox jumps over the lazy dog ...Future Work: Add Kneser–Ney smoothing, sampling with temperature, and top-k decoding.
Contributions, issues, and feature requests are welcome!
If you’d like to improve this project:
- Open an Issue describing the bug, feature, or enhancement.
- Fork the repository and create a new branch.
- Open a Pull Request (PR) with your changes.
This project is licensed under the Apache License 2.0.