Movie Genre Classification With CI/CD Implementation in Heroku

This repository contains a movie genre classification project that predicts the genres of movies based on their overviews. It includes a trained model, a Flask web application, and instructions for local setup and usage.

Prerequisites

Python 3.8 or higher Docker (optional)

Introduction

The movie genre classification project uses natural language processing techniques to predict the genres of movies based on their overviews. It employs a machine learning model trained on a labeled dataset of movie overviews and corresponding genres.

Setup

To use the movie genre classification project, you can follow the instructions below to set it up in your local environment or as a Docker container.

Local Setup

Clone the repository to your local machine:

git clone https://github.com/kndeepak/CI-CD-Pipeline-Implementation.git
Navigate to the project directory:

cd Movie-Genere-CI-CD-Pipeline-Prediction
Install the required dependencies:

pip install -r requirements.txt
Run the Flask web application:

python app.py

Docker Setup

Install Docker on your machine. Refer to the official Docker documentation for instructions specific to your operating system.
Clone the repository to your local machine:

git clone https://github.com/nursnaaz/Movie-Genere-CI-CD-Pipeline-Prediction.git
Navigate to the project directory:

cd Movie-Genere-CI-CD-Pipeline-Prediction
Build the Docker image:

docker build -t movie-genre-classification .
Run the Docker container:

docker run -p 5555:5555 movie-genre-classification

Usage

Once you have set up the movie genre classification project, you can use it in two ways: through the web application or via the API endpoint.

Web Application

Access the local web application by opening your web browser and visiting http://localhost:5555.
Access the Heroku web application by opening your web browser and visiting https://movie-genere.herokuapp.com/

Enter a movie overview in the provided input field.

Click the "Predict" button to get the predicted genres for the movie.

Local API Endpoint

You can also make predictions using the API endpoint.

Endpoint: http://localhost:5555/predict_api Method: POST Request Payload: Parameter name: overview Parameter value: A vengeful New York transit cop decides to steal a trainload of subway fares; his foster brother, a fellow cop, tries to protect him.

Heroku Endpoint

Endpoint: https://movie-genere.herokuapp.com/predict_api Method: POST Request Payload: Parameter name: overview Parameter value: A vengeful New York transit cop decides to steal a trainload of subway fares; his foster brother, a fellow cop, tries to protect him.

Example using cURL:

curl -d "overview=A vengeful New York transit cop decides to steal a trainload of subway fares; his foster brother, a fellow cop, tries to protect him." -X POST https://movie-genere.herokuapp.com/predict_api

curl -d "overview=A vengeful New York transit cop decides to steal a trainload of subway fares; his foster brother, a fellow cop, tries to protect him." -X POST http:///localhost:5555/predict_api

Model Training

The model for movie genre classification was trained using the provided code. Here's a summary of the training process:

The dataset used for training is the movies_metadata.csv file, which can be downloaded from Kaggle (34.45MB file within the 239MB zip file).
The data was preprocessed and cleaned by extracting the 'overview' and 'genres' columns from the dataset.
A TextPreprocessor class was implemented to clean the text data by removing special characters, converting to lowercase, and removing stopwords.
The dataset was split into training and testing sets using a 80% ratio.
The target labels were binarized using MultiLabelBinarizer.
The movie genre classification model was built using a pipeline that includes text preprocessing, TF-IDF vectorization, and a logistic regression classifier.
The model was trained using the training dataset.
The trained model pipeline and the MultiLabelBinarizer were saved as model_pipeline.pkl and mlb.pkl, respectively.

Model Evaluation

After training the model, it was evaluated using the testing dataset. Here are the evaluation results:

Accuracy: 0.16206766206766207
F1-score: 0.5600685836094441 (micro-average F1-score for multi-label classification)

These metrics provide an indication of the model's performance in predicting the genres of movies based on their overviews.

CI/CD

This repository implements CI/CD (Continuous Integration/Continuous Deployment) using GitHub Actions and Heroku. The CI/CD pipeline ensures that whenever a code commit is made, the code is automatically built, tested, and deployed to Heroku.

The workflow in the .github/workflows/main.yml file defines the CI/CD pipeline. It includes steps for installing dependencies, running tests, and deploying the application to Heroku.

To set up CI/CD for your own repository, you can follow these steps:

Create a Heroku account (if you don't have one already) and create a new app.
Set up the Heroku CLI on your local machine and log in to your Heroku account.
Add the necessary Heroku environment variables in your GitHub repository's secrets. These variables may include the Heroku API key, Heroku app name, etc.
Push the code to the GitHub repository, and the CI/CD pipeline will automatically trigger. The workflow will build, test, and deploy the code to your Heroku app.

License

The movie genre classification project is released under the MIT License. You are free to use, modify, and distribute the code for personal and commercial purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
templates		templates
Dockerfile		Dockerfile
Draft-Training.ipynb		Draft-Training.ipynb
Model Training-Finalized.ipynb		Model Training-Finalized.ipynb
Model training - Finalised - Testing.ipynb		Model training - Finalised - Testing.ipynb
README.md		README.md
app.py		app.py
mlb.pkl		mlb.pkl
model_pipeline.pkl		model_pipeline.pkl
requirements.txt		requirements.txt
test_app.py		test_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie Genre Classification With CI/CD Implementation in Heroku

Prerequisites

Table of Contents

Introduction

Setup

Local Setup

Docker Setup

Usage

Web Application

Local API Endpoint

Heroku Endpoint

Example using cURL:

Model Training

Model Evaluation

CI/CD

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

kndeepak/CI-CD-Pipeline-Implementation

Folders and files

Latest commit

History

Repository files navigation

Movie Genre Classification With CI/CD Implementation in Heroku

Prerequisites

Table of Contents

Introduction

Setup

Local Setup

Docker Setup

Usage

Web Application

Local API Endpoint

Heroku Endpoint

Example using cURL:

Model Training

Model Evaluation

CI/CD

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages