This is just a modified version of the prepare_dataset and train scripts from https://github.com/pacman100/DHS-LLM-Workshop to finetune a LLM
prepare_dataset.py is used to create dataset and upload it to hugging face. Usually called on a local machine.
train.py is used to train a given model with a given dataset. This is used in the colab notebooks.
requirements.txt contains all the needed packages to train a model. This is used in the colab notebooks.