A production-ready sentiment analysis system for YouTube comments, built with classical ML and deployed using complete MLOps practices on AWS.
- Sentiment distribution pie chart
- Word cloud visualization
- Top 25 Comments
Notebooks → MLflow → SRC → DVC → Flask API → Chrome Extension → Docker → AWS ECR & EC2 → GitHub Actions
- Notebooks - Experiment with models (Baseline, BoW, TF-IDF)
- MLflow - Track experiments on dedicated EC2 + S3
- SRC - Production pipeline modules
- DVC - Version control for data/models
- Flask API - REST API with visualization endpoints
- Chrome Extension - Browser UI for YouTube
- Docker - Containerization
- AWS ECR & EC2 - Image registry & Cloud deployment
- GitHub Actions - Automated CI/CD
git clone https://github.com/yourusername/mlops-sentiment-analysis.git
cd mlops-sentiment-analysis# Launch Ubuntu EC2 instance (t2.medium recommended)
# Security Group: Open ports 22 (SSH), 5000 (MLflow)ssh -i your-key.pem ubuntu@<mlflow-ec2-public-ip># Update system
sudo apt update && sudo apt upgrade -y
# Install Python and pip
sudo apt install python3-pip python3-venv -y
# Create virtual environment
python3 -m venv mlflow_env
source mlflow_env/bin/activate
# Install MLflow and dependencies
pip install mlflow boto3 pymysql# Configure AWS credentials
aws configure
# Enter: AWS Access Key ID, Secret Access Key, Region (e.g., us-east-1)
# Create S3 bucket for MLflow artifacts
aws s3 mb s3://your-mlflow-artifacts-bucketmlflow server \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root s3://your-mlflow-artifacts-bucket \
--host 0.0.0.0 \
--port 5000✅ MLflow UI accessible at: http://<mlflow-ec2-public-ip>:5000
# On your local machine
cd mlops-sentiment-analysis
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Create .env file
nano .envAdd:
YOUTUBE_DATA_API_V3=your_youtube_api_key_here
MLFLOW_TRACKING_URI=http://<mlflow-ec2-public-ip>:5000
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_REGION=us-east-1# Initialize DVC
dvc init
# Add S3 remote for data storage
dvc remote add -d storage s3://your-dvc-data-bucket/dvc-store
dvc remote modify storage region us-east-1
# Pull existing data (if available)
dvc pull# Run complete pipeline
dvc repro
# Or run stages individually
python src/data_ingestion.py
python src/data_preprocessing.py
python src/model_building.py
python src/model_evaluation.py
python src/model_registry.py✅ Check MLflow UI to verify experiments are logged
# Launch Ubuntu EC2 instance (t2.medium recommended)
# Security Group: Open ports 22 (SSH), 80 (HTTP), 5000 (Flask)ssh -i your-key.pem ubuntu@<api-ec2-public-ip># Install Docker
sudo apt update
sudo apt install docker.io -y
sudo systemctl start docker
sudo systemctl enable docker
sudo usermod -aG docker ubuntu
# Log out and log back in for group changes
exit
ssh -i your-key.pem ubuntu@<api-ec2-public-ip># On your local machine, copy deployment folder
scp -i your-key.pem -r aws_deployment ubuntu@<api-ec2-public-ip>:~/# On EC2 API server
nano /home/ubuntu/mlops-app/.envAdd:
YOUTUBE_DATA_API_V3=your_youtube_api_key
MLFLOW_TRACKING_URI=http://<mlflow-ec2-public-ip>:5000
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_REGION=us-east-1# On your local machine
# Create ECR repository
aws ecr create-repository --repository-name sentiment-analysis --region us-east-1
# Get ECR login token
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <aws-account-id>.dkr.ecr.us-east-1.amazonaws.comBuild on EC2
# On EC2 API server
cd ~/aws_deployment
# Login to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <aws-account-id>.dkr.ecr.us-east-1.amazonaws.com
# Build image
docker build -t sentiment-analysis:latest -f ../Dockerfile ..
# Tag and push
docker tag sentiment-analysis:latest <aws-account-id>.dkr.ecr.us-east-1.amazonaws.com/sentiment-analysis:latest
docker push <aws-account-id>.dkr.ecr.us-east-1.amazonaws.com/sentiment-analysis:latest# Pull from ECR
docker pull <aws-account-id>.dkr.ecr.us-east-1.amazonaws.com/sentiment-analysis:latest
# Run container
docker run -d \
-p 80:5000 \
--env-file /home/ubuntu/mlops-app/.env \
--name sentiment-api \
<aws-account-id>.dkr.ecr.us-east-1.amazonaws.com/sentiment-analysis:latest
# Check logs
docker logs sentiment-api
# Check if running
docker ps✅ API accessible at: http://<api-ec2-public-ip>
curl http://<api-ec2-public-ip>/
curl -X POST http://<api-ec2-public-ip>/predict \
-H "Content-Type: application/json" \
-d '{"comments": ["This is amazing!", "Not good"]}'# On your local machine
cd frontend_chrome_ext
nano popup.jsUpdate the API URL:
const API_URL = 'http://<api-ec2-public-ip>'; // Change this line- Open Chrome browser
- Go to
chrome://extensions/ - Enable Developer mode (top right toggle)
- Click Load unpacked
- Select the
frontend_chrome_ext/folder - Extension icon should appear in toolbar
- Go to any YouTube video
- Click the extension icon
- View sentiment results, charts, and visualizations
MLOPS-sentiment-analysis/
├── .dvc/
├── .github/
│ └── workflows/
│ └── cicd.yaml
├── .venv/
├── aws_deployment/
├── content/
├── data/
├── flask_app/
│ └── app.py
├── frontend_chrome_ext/
│ ├── manifest.json
│ ├── popup.html
│ └── popup.js
├── mlruns/
├── models/
├── notebooks/
│ ├── 01_data_collection_&_eda.ipynb
│ ├── 02_baseline_model.ipynb
│ ├── 03_bag_of_words.ipynb
│ ├── 04_tf_idf.ipynb
│ └── reddit_preprocessed.csv
├── src/
│ ├── data_ingestion.py
│ ├── data_preprocessing.py
│ ├── model_building.py
│ ├── model_evaluation.py
│ └── model_registry.py
├── .dockerignore
├── .dvcignore
├── .env
├── .gitignore
├── Dockerfile
├── dvc.lock
├── dvc.yaml
├── params.yaml
├── readme.ipynb
├── README.md
└── requirements.txt
- YouTube Data API v3 for comment data collection
- scikit-learn for machine learning algorithms
- MLflow for experiment tracking and model registry
- DVC for data and pipeline versioning
- Flask for the REST API framework
- NLTK for natural language processing
- pandas for data manipulation
- Docker for containerization
- AWS for cloud infrastructure (EC2, ECR, S3)
- GitHub Actions for CI/CD automation
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
Star this repo if you find it helpful!
