Korean Subtitle Extractor - Documentation Hub

Welcome to the comprehensive documentation for the Korean Subtitle Extractor MVP - a micro-SaaS application that extracts hardcoded Korean subtitles from YouTube videos using OCR and optionally translates them to English.

📚 Documentation Overview

This documentation provides everything you need to understand, develop, deploy, and maintain the Korean Subtitle Extractor project.

📋 Available Documentation

Document	Description	Audience
Project Structure	Complete project architecture and component breakdown	All developers
Development Guide	Setup, development workflow, and troubleshooting	Developers
API Documentation	Complete REST API and WebSocket documentation	Frontend developers, API consumers
Deployment Guide	Production deployment, security, and monitoring	DevOps, System administrators
Error Learnings	Complete error documentation (29+ errors, 946 lines)	All developers
Error Summary	Condensed error patterns and prevention strategies	Project leads, Senior developers

🚀 Quick Start Links

New to the project? → Start with Project Structure
Setting up development? → Follow Development Guide
Integrating with the API? → Check API Documentation
Deploying to production? → Use Deployment Guide
Debugging issues? → Search Error Learnings

🎯 Project Summary

🚀 Features

YouTube Video Processing: Extract subtitles from videos up to 20 minutes
Advanced OCR: Google Cloud Vision API for accurate Korean text recognition
Smart Cropping: Remove YouTube channel logos before OCR processing
Smart Deduplication: Remove duplicate subtitles while preserving timing
Bulk Translation: Translate Korean text to English with proper context
Optional Translation: Google Cloud Translate for Korean to English conversion
SRT Generation: Properly formatted subtitle files for both languages
Real-time Progress: WebSocket updates during processing
Responsive Design: Works on desktop, tablet, and mobile devices
Comprehensive Testing: Unit, integration, and end-to-end test coverage

🏗️ Architecture

Frontend: React 18 + TypeScript + Vite + Tailwind CSS
Backend: Python FastAPI + SQLAlchemy + Redis
External APIs: Google Cloud Vision (OCR) + Google Cloud Translate
Database: SQLite (development) / PostgreSQL (production)
Testing: Pytest + Vitest + Playwright

korean-subtitle-extractor/
├── frontend/          # React TypeScript app
│   ├── src/
│   │   ├── components/    # UI components
│   │   ├── hooks/         # Custom React hooks
│   │   ├── services/      # API client
│   │   ├── types/         # TypeScript definitions
│   │   └── test/          # Frontend tests
│   ├── tests/e2e/     # Playwright E2E tests
│   └── playwright.config.ts
├── backend/           # Python FastAPI
│   ├── app/
│   │   ├── api/v1/        # API endpoints
│   │   ├── services/      # Business logic
│   │   ├── models/        # Database models
│   │   ├── utils/         # Utilities
│   │   └── main.py
│   ├── tests/
│   │   ├── unit/          # Unit tests
│   │   └── integration/   # Integration tests
│   └── requirements.txt
└── README.md

📋 Prerequisites

Python 3.9+
Node.js 18+
Redis Server
Google Cloud Project with Vision API and Translate API enabled
Google Cloud Service Account with appropriate permissions

🛠️ Installation & Setup

1. Clone Repository

git clone <repository-url>
cd korean-subtitle-extractor

2. Backend Setup

cd backend

# Create virtual environment
python -m venv venv

# Activate virtual environment
# Windows:
venv/Scripts/activate
# macOS/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Copy environment template
cp .env.example .env

3. Frontend Setup

cd frontend

# Install dependencies
npm install

# Copy environment template
cp .env.example .env

4. Google Cloud Setup

Create a Google Cloud Project
Enable the following APIs:
- Cloud Vision API
- Cloud Translation API
Create a Service Account
Download the service account JSON key

Copy the entire JSON content and paste it as a string in your backend .env file:

GOOGLE_APPLICATION_CREDENTIALS={"type":"service_account","project_id":"...","private_key":"...","client_email":"..."}

5. Environment Configuration

Backend (.env)

# Google Cloud (JSON string containing service account credentials)
GOOGLE_APPLICATION_CREDENTIALS={"type":"service_account","project_id":"your-project-id","private_key_id":"","private_key":"-----BEGIN PRIVATE KEY-----\nYOUR_PRIVATE_KEY_HERE\n-----END PRIVATE KEY-----\n","client_email":"your-service-account@your-project.iam.gserviceaccount.com","client_id":"","auth_uri":"https://accounts.google.com/o/oauth2/auth","token_uri":"https://oauth2.googleapis.com/token","auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs","client_x509_cert_url":"","universe_domain":"googleapis.com"}
GOOGLE_CLOUD_PROJECT=your-project-id

# Database
DATABASE_URL=sqlite:///./app.db

# Redis
REDIS_URL=redis://localhost:6379

# API Settings
MAX_VIDEO_DURATION=1200  # 20 minutes
CORS_ORIGINS="default ports of frontend app"

# Processing Settings
FRAME_CROP_RATIO=0.1
OCR_BATCH_SIZE=10

# Analytics (Optional)
GA4_MEASUREMENT_ID=G-XXXXXXXXXX
CLARITY_PROJECT_ID=your-clarity-id

Frontend (.env)

# API Configuration
VITE_API_URL=http://localhost:8000
VITE_WS_URL=ws://localhost:8000

# Analytics (Optional)
VITE_GA4_MEASUREMENT_ID=G-XXXXXXXXXX
VITE_CLARITY_PRO JECT_ID=your-clarity-id

6. Database Setup

cd backend

# Run database migrations (if using Alembic)
alembic upgrade head

7. Start Services

Start Redis (if not running as service)

# Windows (if installed via Chocolatey)
redis-server

# macOS (if installed via Homebrew)
brew services start redis

# Linux
sudo systemctl start redis

Start Backend Server

cd backend
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Start Frontend Development Server

cd frontend
npm run dev

The application will be available at:

Frontend: http://localhost:5173
Backend API: http://localhost:8000
API Documentation: http://localhost:8000/docs

🧪 Testing

Backend Tests

cd backend

# Run all tests
pytest

# Run with coverage
pytest --cov=app --cov-report=html

# Run specific test categories
pytest tests/unit/          # Unit tests only
pytest tests/integration/   # Integration tests only

Frontend Tests

cd frontend

# Run unit/component tests
npm run test

# Run tests in watch mode
npm run test:watch

# Run tests with coverage
npm run test:coverage

End-to-End Tests

cd frontend

# Install Playwright browsers (first time only)
npx playwright install

# Run E2E tests
npm run test:e2e

# Run E2E tests with UI
npm run test:e2e:ui

📝 Code Quality

Backend

cd backend

# Linting
pylint app/

# Formatting
black app/
isort app/

# Type checking
mypy app/

Frontend

cd frontend

# Linting
npm run lint

# Fix linting issues
npm run lint:fix

# Type checking
npm run type-check

📊 Processing Pipeline

Video Download: yt-dlp extracts video info and validates duration (≤20min)
Frame Extraction: OpenCV extracts 1 frame per second with timestamps
Frame Cropping: Remove 10% from sides to eliminate channel logos
OCR Processing: Google Cloud Vision API batch processes frames (10 frames/request)
Text Deduplication: Remove duplicate text using 90% similarity threshold
Bulk Translation: Translate all Korean texts together for better context
SRT Generation: Create properly formatted subtitle files

📊 Usage

Enter YouTube URL: Paste a YouTube video URL (max 20 minutes)
Choose Translation: Toggle English translation on/off
Start Processing: Click "Process Video" to begin
Monitor Progress: Watch real-time progress updates
Download SRT Files: Download Korean and/or English subtitle files

🔧 API Documentation

Once the backend is running, visit:

Interactive API Docs: http://localhost:8000/docs
ReDoc Documentation: http://localhost:8000/redoc

Key Endpoints

POST /api/v1/process - Start video processing
GET /api/v1/status/{job_id} - Get job status
GET /api/v1/download/{job_id}/{language} - Download SRT file
WebSocket /ws/progress/{job_id} - Real-time progress updates

🐛 Troubleshooting

Common Issues

Backend Issues

ImportError: No module named 'app'

# Make sure you're in the backend directory and virtual environment is activated
cd backend
source venv/bin/activate  # or venv/Scripts/activate on Windows

Google Cloud Authentication Error

# Verify credentials file exists and path is correct
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json

# Test authentication
gcloud auth application-default login

Redis Connection Error

# Check if Redis is running
redis-cli ping

# Start Redis if not running
redis-server

Frontend Issues

Port 5173 already in use

# Use different port
npm run dev -- --port 3000

TypeScript Errors

# Run type checking
npm run type-check

# Update types
npm run type-check --watch

📋 Critical Requirements

1 frame per second extraction rate for accuracy
20-minute maximum video duration for MVP
Frame cropping to remove YouTube channel logos
Order preservation for OCR results (handle empty frames)
Bulk translation for better context
Complete data relationships: frame→timestamp→OCR→translation

🚨 Important Notes

Always crop frames before OCR to remove channel logos
Maintain data relationships using FrameOCRMapping throughout
Preserve frame order even when OCR returns empty results
Use bulk translation for better context, then map back to timestamps
Track analytics events for user behavior insights

📈 Analytics & SEO

The application integrates with:

Google Analytics 4 for user behavior tracking
Microsoft Clarity for session recordings and heat maps
Google Search Console for search performance monitoring

🛡️ Security Considerations

Input validation for YouTube URLs
Rate limiting on API endpoints
Secure file handling and cleanup
Environment variable protection
CORS configuration for production

🤝 Contributing

Fork the repository
Create a feature branch: git checkout -b feature/new-feature
Make your changes and add tests
Run the test suite: npm test and pytest
Commit your changes: git commit -m 'Add new feature'
Push to the branch: git push origin feature/new-feature
Submit a pull request

Development Guidelines

Follow TypeScript strict mode
Write unit tests for new features
Update integration tests for API changes
Follow existing code style and patterns
Update documentation for new features

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙋 Support

If you encounter any issues:

Check the troubleshooting section above
Search existing GitHub Issues
Create a new issue with detailed reproduction steps
Include logs and error messages

Built with ❤️ for the Korean language learning community

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
docs		docs
frontend		frontend
shared		shared
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md

mohithgupta/SRT_Generator

Folders and files

Latest commit

History

Repository files navigation