This repository contains a Python program for classifying Amazon reviews into five classes of ratings using natural language processing techniques. The classifier employs the TF-IDF vectorization method in combination with logistic regression to achieve accurate classification results.
classifier.py: The main Python script that performs the classification task.config.json: Configuration file specifying the file paths for training and testing data.data/: Directory containing the JSON files for training and testing data.
- Clone this repository to your local machine.
- Update the
config.jsonfile with the correct file paths for your training and testing data. - Run the
main.pyscript.
The program will output the classification results, including accuracy metrics and confusion matrix, for the test data.
The dataset used for this classification task consists of Amazon reviews. Each review is labeled with one of five rating classes.
- Data Preparation: The program reads the JSON files and organizes the data into a list of dictionaries.
- Feature Extraction: TF-IDF vectorization is applied to convert text data into numerical features.
- Model Training: Logistic regression model is trained on the TF-IDF vectors of the training data.
- Model Evaluation: The trained model is evaluated on the testing data to measure its performance.
This project is licensed under the MIT License. See the LICENSE file for details.