Skip to content

7rohxt/yt-sentiment-mlops

Repository files navigation

Youtube Comments Sentiment Analysis (MLOps - AWS)

A production-ready sentiment analysis system for YouTube comments, built with classical ML and deployed using complete MLOps practices on AWS.

🎬 Demo

Chrome Extension Demo

Real-time sentiment analysis on YouTube comments

Live Features

  • Sentiment distribution pie chart
  • Word cloud visualization
  • Top 25 Comments

Project Workflow

Notebooks → MLflow → SRC → DVC → Flask API → Chrome Extension → Docker → AWS ECR & EC2 → GitHub Actions

Pipeline Overview

  1. Notebooks - Experiment with models (Baseline, BoW, TF-IDF)
  2. MLflow - Track experiments on dedicated EC2 + S3
  3. SRC - Production pipeline modules
  4. DVC - Version control for data/models
  5. Flask API - REST API with visualization endpoints
  6. Chrome Extension - Browser UI for YouTube
  7. Docker - Containerization
  8. AWS ECR & EC2 - Image registry & Cloud deployment
  9. GitHub Actions - Automated CI/CD

🔧 Setup Instructions

Step 1: Clone Repository

git clone https://github.com/yourusername/mlops-sentiment-analysis.git
cd mlops-sentiment-analysis

Step 2: Setup MLflow Tracking Server (EC2 Instance #1)

2.1 Launch EC2 Instance

# Launch Ubuntu EC2 instance (t2.medium recommended)
# Security Group: Open ports 22 (SSH), 5000 (MLflow)

2.2 SSH into MLflow Server

ssh -i your-key.pem ubuntu@<mlflow-ec2-public-ip>

2.3 Install Dependencies

# Update system
sudo apt update && sudo apt upgrade -y

# Install Python and pip
sudo apt install python3-pip python3-venv -y

# Create virtual environment
python3 -m venv mlflow_env
source mlflow_env/bin/activate

# Install MLflow and dependencies
pip install mlflow boto3 pymysql

2.4 Configure S3 for Artifact Storage

# Configure AWS credentials
aws configure
# Enter: AWS Access Key ID, Secret Access Key, Region (e.g., us-east-1)

# Create S3 bucket for MLflow artifacts
aws s3 mb s3://your-mlflow-artifacts-bucket

2.5 Start MLflow Server

mlflow server \
  --backend-store-uri sqlite:///mlflow.db \
  --default-artifact-root s3://your-mlflow-artifacts-bucket \
  --host 0.0.0.0 \
  --port 5000

✅ MLflow UI accessible at: http://<mlflow-ec2-public-ip>:5000


Step 3: Local Development & Training

3.1 Setup Local Environment

# On your local machine
cd mlops-sentiment-analysis

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

3.2 Configure Environment Variables

# Create .env file
nano .env

Add:

YOUTUBE_DATA_API_V3=your_youtube_api_key_here
MLFLOW_TRACKING_URI=http://<mlflow-ec2-public-ip>:5000
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_REGION=us-east-1

3.3 Initialize DVC

# Initialize DVC
dvc init

# Add S3 remote for data storage
dvc remote add -d storage s3://your-dvc-data-bucket/dvc-store
dvc remote modify storage region us-east-1

# Pull existing data (if available)
dvc pull

3.4 Run Training Pipeline

# Run complete pipeline
dvc repro

# Or run stages individually
python src/data_ingestion.py
python src/data_preprocessing.py
python src/model_building.py
python src/model_evaluation.py
python src/model_registry.py

✅ Check MLflow UI to verify experiments are logged


Step 4: Deploy Flask API (EC2 Instance #2)

4.1 Launch EC2 Instance

# Launch Ubuntu EC2 instance (t2.medium recommended)
# Security Group: Open ports 22 (SSH), 80 (HTTP), 5000 (Flask)

4.2 SSH into API Server

ssh -i your-key.pem ubuntu@<api-ec2-public-ip>

4.3 Install Docker

# Install Docker
sudo apt update
sudo apt install docker.io -y
sudo systemctl start docker
sudo systemctl enable docker
sudo usermod -aG docker ubuntu

# Log out and log back in for group changes
exit
ssh -i your-key.pem ubuntu@<api-ec2-public-ip>

4.4 Upload Deployment Files

# On your local machine, copy deployment folder
scp -i your-key.pem -r aws_deployment ubuntu@<api-ec2-public-ip>:~/

4.5 Create .env File on Server

# On EC2 API server
nano /home/ubuntu/mlops-app/.env

Add:

YOUTUBE_DATA_API_V3=your_youtube_api_key
MLFLOW_TRACKING_URI=http://<mlflow-ec2-public-ip>:5000
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_REGION=us-east-1

4.6 Setup AWS ECR

# On your local machine
# Create ECR repository
aws ecr create-repository --repository-name sentiment-analysis --region us-east-1

# Get ECR login token
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <aws-account-id>.dkr.ecr.us-east-1.amazonaws.com

4.7 Build and Push Docker Image

Build on EC2

# On EC2 API server
cd ~/aws_deployment

# Login to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <aws-account-id>.dkr.ecr.us-east-1.amazonaws.com

# Build image
docker build -t sentiment-analysis:latest -f ../Dockerfile ..

# Tag and push
docker tag sentiment-analysis:latest <aws-account-id>.dkr.ecr.us-east-1.amazonaws.com/sentiment-analysis:latest
docker push <aws-account-id>.dkr.ecr.us-east-1.amazonaws.com/sentiment-analysis:latest

4.8 Run Docker Container

# Pull from ECR
docker pull <aws-account-id>.dkr.ecr.us-east-1.amazonaws.com/sentiment-analysis:latest

# Run container
docker run -d \
  -p 80:5000 \
  --env-file /home/ubuntu/mlops-app/.env \
  --name sentiment-api \
  <aws-account-id>.dkr.ecr.us-east-1.amazonaws.com/sentiment-analysis:latest

# Check logs
docker logs sentiment-api

# Check if running
docker ps

✅ API accessible at: http://<api-ec2-public-ip>

4.9 Test API

curl http://<api-ec2-public-ip>/
curl -X POST http://<api-ec2-public-ip>/predict \
  -H "Content-Type: application/json" \
  -d '{"comments": ["This is amazing!", "Not good"]}'

Step 5: Install Chrome Extension

5.1 Update API Endpoint

# On your local machine
cd frontend_chrome_ext
nano popup.js

Update the API URL:

const API_URL = 'http://<api-ec2-public-ip>';  // Change this line

5.2 Load Extension in Chrome

  1. Open Chrome browser
  2. Go to chrome://extensions/
  3. Enable Developer mode (top right toggle)
  4. Click Load unpacked
  5. Select the frontend_chrome_ext/ folder
  6. Extension icon should appear in toolbar

5.3 Test Extension

  1. Go to any YouTube video
  2. Click the extension icon
  3. View sentiment results, charts, and visualizations

Project Structure

MLOPS-sentiment-analysis/
├── .dvc/
├── .github/
│   └── workflows/
│       └── cicd.yaml
├── .venv/
├── aws_deployment/
├── content/
├── data/
├── flask_app/
│   └── app.py
├── frontend_chrome_ext/
│   ├── manifest.json
│   ├── popup.html
│   └── popup.js
├── mlruns/
├── models/
├── notebooks/
│   ├── 01_data_collection_&_eda.ipynb
│   ├── 02_baseline_model.ipynb
│   ├── 03_bag_of_words.ipynb
│   ├── 04_tf_idf.ipynb
│   └── reddit_preprocessed.csv
├── src/
│   ├── data_ingestion.py
│   ├── data_preprocessing.py
│   ├── model_building.py
│   ├── model_evaluation.py
│   └── model_registry.py
├── .dockerignore
├── .dvcignore
├── .env
├── .gitignore
├── Dockerfile
├── dvc.lock
├── dvc.yaml
├── params.yaml
├── readme.ipynb
├── README.md
└── requirements.txt

Acknowledgments

License

License: MIT

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


Star this repo if you find it helpful!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors