North Jersey Project: Tony Soprano AI

The North Jersey Project is a scalable, cloud-native conversational AI service that simulates the persona of Tony Soprano. This project showcases the full life-cycle of a production-ready AI application, covering specialized model fine-tuning, containerized deployment, and comprehensive system observability.

🤌 Project Overview

We built a specialized chatbot capable of character-accurate dialogue generation by fine-tuning a Large Language Model (LLM) and deploying it within a high-performance, auto-scaling infrastructure.

🧠 Model & Fine-Tuning

Base Model

Utilized Microsoft Phi-3 Mini Instruct, a lightweight 3.8B parameter model selected for its efficiency and reasoning capabilities.

Fine-Tuning Process

Performed Low-Rank Adaptation (LoRA) on a Google Colab TPU.
Adjusted only ~0.33% of the model parameters.
Achieved a character-specific linguistic style with a training loss of 0.91.

Dataset

Curated a specialized dataset of The Sopranos dialogue.
Supplemented with synthetic data to improve character behavior.

Optimization

Converted the model from PyTorch to GGUF format.
Utilized llama.cpp for significantly improved inference speed in production.

☁️ Cloud Infrastructure & Deployment

Orchestration

Deployed on Google Kubernetes Engine (GKE).
Used C2 (Compute-Optimized) nodes with 8 vCPUs to handle heavy inference loads.

Scalability

Implemented a Horizontal Pod Autoscaler (HPA).
Automatically spins up new pods when CPU usage exceeds 50%.

Cost Efficiency

Leveraged GKE Spot Instances to reduce infrastructure costs while maintaining high availability.

Frontend

Built with Next.js (TypeScript/React).
Deployed on Vercel with automatic SSL provisioning.

📊 Observability & Reliability

Monitoring

Integrated Prometheus to scrape real-time cluster metrics.
Provides visibility into CPU and memory spikes.

Visualization

Built custom Grafana dashboards.
Tracks pod health and request latency.
Verifies that auto-scaling triggers correctly under load.

Infrastructure as Code

Managed deployments using Helm for simplified Kubernetes configuration.

🛠️ Tech Stack

AI / ML

Microsoft Phi-3
LoRA
Hugging Face
llama.cpp (GGUF)

Backend

FastAPI
Python
Pydantic

Frontend

Next.js
React
TypeScript
Tailwind CSS

DevOps

Docker
Kubernetes (GKE)
Helm
Prometheus
Grafana
Vercel

Architecture

Backend: FastAPI server using llama.cpp with GGUF model format
Frontend: Next.js application with TypeScript and Tailwind CSS
Model: Fine-tuned Phi-3 model with custom adapter weights

Requirements

Python 3.9+
Node.js 18+
tony.gguf model file (place in project root)

Setup

Backend

pip install -r requirements.txt
pip install llama-cpp-python
python main.py

Server runs on http://localhost:8000

Frontend

cd frontend
npm install
npm run dev

Set NEXT_PUBLIC_API_ENDPOINT environment variable to your backend URL.

Features

Streaming responses via Server-Sent Events (SSE)
Multi-turn conversation with context management
Token-based input limits and history truncation
Prometheus metrics instrumentation

Deployment

Docker and Kubernetes configurations are included. The Dockerfile builds a containerized version of the backend service.

Model Training

Training data and scripts are located in data/ and training/. The model combines base Phi-3 weights with a custom LoRA adapter trained on character-specific dialogue.

Docker Hub

https://hub.docker.com/r/mdallolmo1/soprano-bot

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
frontend		frontend
merged_model		merged_model
soprano_adapter		soprano_adapter
training		training
.d		.d
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
backend-config.yaml		backend-config.yaml
hpa.yaml		hpa.yaml
ingress.yaml		ingress.yaml
main.py		main.py
merge_model.py		merge_model.py
monitor.yaml		monitor.yaml
readme.md		readme.md
requirements.txt		requirements.txt
service.yaml		service.yaml
soprano-deployment.yaml		soprano-deployment.yaml

ChaseH01/NorthJerseyProject

Folders and files

Latest commit

History

Repository files navigation

North Jersey Project: Tony Soprano AI

🤌 Project Overview

🧠 Model & Fine-Tuning

Base Model

Fine-Tuning Process

Dataset

Optimization

☁️ Cloud Infrastructure & Deployment

Orchestration

Scalability

Cost Efficiency

Frontend

📊 Observability & Reliability

Monitoring

Visualization

Infrastructure as Code

🛠️ Tech Stack

AI / ML

Backend

Frontend

DevOps

Architecture

Requirements

Setup

Backend

Frontend

Features

Deployment

Model Training

Docker Hub

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages