This project fine-tunes and evaluates GPT-2 and LLaMA 3.2-1B models for sentiment classification using the IMDB dataset. The project includes two scripts:
finetune_eval.py: Fine-tunes the models with LoRA and evaluates their performance.baseline.py: Evaluates the raw, pre-trained models without fine-tuning as a baseline comparison.
- Distributed Training Support: Uses
torch.distributedfor multi-GPU training. - Hugging Face Integration: Loads datasets and pre-trained models from Hugging Face.
- LoRA Fine-Tuning: Utilizes parameter-efficient training via LoRA in
finetune_eval.py. - Baseline Evaluation: Assesses unmodified models in
baseline.py. - Metrics & Evaluation: Computes accuracy, precision-recall, ROC curves, and confusion matrices.
- Logging & Checkpointing: Supports logging, checkpoint resumption, and detailed step timing.
- Data Visualization: Generates plots for training loss, precision-recall, ROC curves, confusion matrices, and word clouds for misclassified reviews.
Ensure you have Python 3.8+ and pip installed. Then, install the required dependencies:
pip install -r requirements.txtFor multi-GPU training, use torch.distributed.launch or torchrun. Example:
torchrun --nproc_per_node=4 finetune_eval.py --debugThis command runs the script using 4 GPUs. Adjust --nproc_per_node based on the number of available GPUs.
For the baseline evaluation on multiple GPUs:
torchrun --nproc_per_node=4 baseline.py --debugExecute the script with:
python finetune_eval.py --debugOptions:
--debug: Enables debug mode with a smaller dataset and reduced training steps.--resume_from_checkpoint <path>: Resume training from a specific checkpoint.
Execute the baseline evaluation script with:
python baseline.py --debugOptions:
--debug: Runs the evaluation on a smaller dataset for faster testing.
Set your Hugging Face API token as an environment variable:
export HF_ACCESS_TOKEN=<your_token>After running the scripts, the following are saved:
- Trained Models & Tokenizers: Stored in
checkpoints/. - Evaluation Metrics: Saved as JSON (
eval_results.json). - Training Logs: Stored in
logs/training_output.log. - Visualization Outputs:
confusion_matrix.pngprecision_recall_curve.pngroc_curve.pngtraining_loss.pngsentiment_distribution.pngmisclassified_wordcloud.png
- Evaluation Metrics: Saved in
baseline_results/baseline_eval.json. - Logs: Stored in
logs/baseline_eval_output.log. - Visualization Outputs:
baseline_results/confusion_matrix.pngbaseline_results/precision_recall_curve.pngbaseline_results/roc_curve.pngbaseline_results/prediction_distribution.pngbaseline_results/misclassified_wordcloud.png
This project is open-source and free to use.