Youtube Comments Sentiment Analysis (MLOps - AWS)

A production-ready sentiment analysis system for YouTube comments, built with classical ML and deployed using complete MLOps practices on AWS.

🎬 Demo

Real-time sentiment analysis on YouTube comments

Live Features

Sentiment distribution pie chart
Word cloud visualization
Top 25 Comments

Project Workflow

Notebooks → MLflow → SRC → DVC → Flask API → Chrome Extension → Docker → AWS ECR & EC2 → GitHub Actions

Pipeline Overview

Notebooks - Experiment with models (Baseline, BoW, TF-IDF)
MLflow - Track experiments on dedicated EC2 + S3
SRC - Production pipeline modules
DVC - Version control for data/models
Flask API - REST API with visualization endpoints
Chrome Extension - Browser UI for YouTube
Docker - Containerization
AWS ECR & EC2 - Image registry & Cloud deployment
GitHub Actions - Automated CI/CD

🔧 Setup Instructions

Step 1: Clone Repository

git clone https://github.com/yourusername/mlops-sentiment-analysis.git
cd mlops-sentiment-analysis

Step 2: Setup MLflow Tracking Server (EC2 Instance #1)

2.1 Launch EC2 Instance

# Launch Ubuntu EC2 instance (t2.medium recommended)
# Security Group: Open ports 22 (SSH), 5000 (MLflow)

2.2 SSH into MLflow Server

ssh -i your-key.pem ubuntu@<mlflow-ec2-public-ip>

2.3 Install Dependencies

# Update system
sudo apt update && sudo apt upgrade -y

# Install Python and pip
sudo apt install python3-pip python3-venv -y

# Create virtual environment
python3 -m venv mlflow_env
source mlflow_env/bin/activate

# Install MLflow and dependencies
pip install mlflow boto3 pymysql

2.4 Configure S3 for Artifact Storage

# Configure AWS credentials
aws configure
# Enter: AWS Access Key ID, Secret Access Key, Region (e.g., us-east-1)

# Create S3 bucket for MLflow artifacts
aws s3 mb s3://your-mlflow-artifacts-bucket

2.5 Start MLflow Server

mlflow server \
  --backend-store-uri sqlite:///mlflow.db \
  --default-artifact-root s3://your-mlflow-artifacts-bucket \
  --host 0.0.0.0 \
  --port 5000

✅ MLflow UI accessible at: http://<mlflow-ec2-public-ip>:5000

Step 3: Local Development & Training

3.1 Setup Local Environment

# On your local machine
cd mlops-sentiment-analysis

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

3.2 Configure Environment Variables

# Create .env file
nano .env

Add:

YOUTUBE_DATA_API_V3=your_youtube_api_key_here
MLFLOW_TRACKING_URI=http://<mlflow-ec2-public-ip>:5000
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_REGION=us-east-1

3.3 Initialize DVC

# Initialize DVC
dvc init

# Add S3 remote for data storage
dvc remote add -d storage s3://your-dvc-data-bucket/dvc-store
dvc remote modify storage region us-east-1

# Pull existing data (if available)
dvc pull

3.4 Run Training Pipeline

# Run complete pipeline
dvc repro

# Or run stages individually
python src/data_ingestion.py
python src/data_preprocessing.py
python src/model_building.py
python src/model_evaluation.py
python src/model_registry.py

✅ Check MLflow UI to verify experiments are logged

Step 4: Deploy Flask API (EC2 Instance #2)

4.1 Launch EC2 Instance

# Launch Ubuntu EC2 instance (t2.medium recommended)
# Security Group: Open ports 22 (SSH), 80 (HTTP), 5000 (Flask)

4.2 SSH into API Server

ssh -i your-key.pem ubuntu@<api-ec2-public-ip>

4.3 Install Docker

# Install Docker
sudo apt update
sudo apt install docker.io -y
sudo systemctl start docker
sudo systemctl enable docker
sudo usermod -aG docker ubuntu

# Log out and log back in for group changes
exit
ssh -i your-key.pem ubuntu@<api-ec2-public-ip>

4.4 Upload Deployment Files

# On your local machine, copy deployment folder
scp -i your-key.pem -r aws_deployment ubuntu@<api-ec2-public-ip>:~/

4.5 Create .env File on Server

# On EC2 API server
nano /home/ubuntu/mlops-app/.env

Add:

YOUTUBE_DATA_API_V3=your_youtube_api_key
MLFLOW_TRACKING_URI=http://<mlflow-ec2-public-ip>:5000
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_REGION=us-east-1

4.6 Setup AWS ECR

# On your local machine
# Create ECR repository
aws ecr create-repository --repository-name sentiment-analysis --region us-east-1

# Get ECR login token
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <aws-account-id>.dkr.ecr.us-east-1.amazonaws.com

4.7 Build and Push Docker Image

Build on EC2

# On EC2 API server
cd ~/aws_deployment

# Login to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <aws-account-id>.dkr.ecr.us-east-1.amazonaws.com

# Build image
docker build -t sentiment-analysis:latest -f ../Dockerfile ..

# Tag and push
docker tag sentiment-analysis:latest <aws-account-id>.dkr.ecr.us-east-1.amazonaws.com/sentiment-analysis:latest
docker push <aws-account-id>.dkr.ecr.us-east-1.amazonaws.com/sentiment-analysis:latest

4.8 Run Docker Container

# Pull from ECR
docker pull <aws-account-id>.dkr.ecr.us-east-1.amazonaws.com/sentiment-analysis:latest

# Run container
docker run -d \
  -p 80:5000 \
  --env-file /home/ubuntu/mlops-app/.env \
  --name sentiment-api \
  <aws-account-id>.dkr.ecr.us-east-1.amazonaws.com/sentiment-analysis:latest

# Check logs
docker logs sentiment-api

# Check if running
docker ps

✅ API accessible at: http://<api-ec2-public-ip>

4.9 Test API

curl http://<api-ec2-public-ip>/
curl -X POST http://<api-ec2-public-ip>/predict \
  -H "Content-Type: application/json" \
  -d '{"comments": ["This is amazing!", "Not good"]}'

Step 5: Install Chrome Extension

5.1 Update API Endpoint

# On your local machine
cd frontend_chrome_ext
nano popup.js

Update the API URL:

const API_URL = 'http://<api-ec2-public-ip>';  // Change this line

5.2 Load Extension in Chrome

Open Chrome browser
Go to chrome://extensions/
Enable Developer mode (top right toggle)
Click Load unpacked
Select the frontend_chrome_ext/ folder
Extension icon should appear in toolbar

5.3 Test Extension

Go to any YouTube video
Click the extension icon
View sentiment results, charts, and visualizations

Project Structure

MLOPS-sentiment-analysis/
├── .dvc/
├── .github/
│   └── workflows/
│       └── cicd.yaml
├── .venv/
├── aws_deployment/
├── content/
├── data/
├── flask_app/
│   └── app.py
├── frontend_chrome_ext/
│   ├── manifest.json
│   ├── popup.html
│   └── popup.js
├── mlruns/
├── models/
├── notebooks/
│   ├── 01_data_collection_&_eda.ipynb
│   ├── 02_baseline_model.ipynb
│   ├── 03_bag_of_words.ipynb
│   ├── 04_tf_idf.ipynb
│   └── reddit_preprocessed.csv
├── src/
│   ├── data_ingestion.py
│   ├── data_preprocessing.py
│   ├── model_building.py
│   ├── model_evaluation.py
│   └── model_registry.py
├── .dockerignore
├── .dvcignore
├── .env
├── .gitignore
├── Dockerfile
├── dvc.lock
├── dvc.yaml
├── params.yaml
├── readme.ipynb
├── README.md
└── requirements.txt

Acknowledgments

YouTube Data API v3 for comment data collection
scikit-learn for machine learning algorithms
MLflow for experiment tracking and model registry
DVC for data and pipeline versioning
Flask for the REST API framework
NLTK for natural language processing
pandas for data manipulation
Docker for containerization
AWS for cloud infrastructure (EC2, ECR, S3)
GitHub Actions for CI/CD automation

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Star this repo if you find it helpful!

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.dvc		.dvc
.github/workflows		.github/workflows
assets		assets
aws_deployment		aws_deployment
data		data
flask_app		flask_app
frontend_chrome_ext		frontend_chrome_ext
notebooks		notebooks
src		src
.dockerignore		.dockerignore
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
params.yaml		params.yaml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Youtube Comments Sentiment Analysis (MLOps - AWS)

🎬 Demo

Live Features

Project Workflow

Pipeline Overview

🔧 Setup Instructions

Step 1: Clone Repository

Step 2: Setup MLflow Tracking Server (EC2 Instance #1)

2.1 Launch EC2 Instance

2.2 SSH into MLflow Server

2.3 Install Dependencies

2.4 Configure S3 for Artifact Storage

2.5 Start MLflow Server

Step 3: Local Development & Training

3.1 Setup Local Environment

3.2 Configure Environment Variables

3.3 Initialize DVC

3.4 Run Training Pipeline

Step 4: Deploy Flask API (EC2 Instance #2)

4.1 Launch EC2 Instance

4.2 SSH into API Server

4.3 Install Docker

4.4 Upload Deployment Files

4.5 Create .env File on Server

4.6 Setup AWS ECR

4.7 Build and Push Docker Image

4.8 Run Docker Container

4.9 Test API

Step 5: Install Chrome Extension

5.1 Update API Endpoint

5.2 Load Extension in Chrome

5.3 Test Extension

Project Structure

Acknowledgments

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages