Welcome to the comprehensive documentation for the Korean Subtitle Extractor MVP - a micro-SaaS application that extracts hardcoded Korean subtitles from YouTube videos using OCR and optionally translates them to English.
This documentation provides everything you need to understand, develop, deploy, and maintain the Korean Subtitle Extractor project.
| Document | Description | Audience |
|---|---|---|
| Project Structure | Complete project architecture and component breakdown | All developers |
| Development Guide | Setup, development workflow, and troubleshooting | Developers |
| API Documentation | Complete REST API and WebSocket documentation | Frontend developers, API consumers |
| Deployment Guide | Production deployment, security, and monitoring | DevOps, System administrators |
| Error Learnings | Complete error documentation (29+ errors, 946 lines) | All developers |
| Error Summary | Condensed error patterns and prevention strategies | Project leads, Senior developers |
- New to the project? β Start with Project Structure
- Setting up development? β Follow Development Guide
- Integrating with the API? β Check API Documentation
- Deploying to production? β Use Deployment Guide
- Debugging issues? β Search Error Learnings
- YouTube Video Processing: Extract subtitles from videos up to 20 minutes
- Advanced OCR: Google Cloud Vision API for accurate Korean text recognition
- Smart Cropping: Remove YouTube channel logos before OCR processing
- Smart Deduplication: Remove duplicate subtitles while preserving timing
- Bulk Translation: Translate Korean text to English with proper context
- Optional Translation: Google Cloud Translate for Korean to English conversion
- SRT Generation: Properly formatted subtitle files for both languages
- Real-time Progress: WebSocket updates during processing
- Responsive Design: Works on desktop, tablet, and mobile devices
- Comprehensive Testing: Unit, integration, and end-to-end test coverage
- Frontend: React 18 + TypeScript + Vite + Tailwind CSS
- Backend: Python FastAPI + SQLAlchemy + Redis
- External APIs: Google Cloud Vision (OCR) + Google Cloud Translate
- Database: SQLite (development) / PostgreSQL (production)
- Testing: Pytest + Vitest + Playwright
korean-subtitle-extractor/
βββ frontend/ # React TypeScript app
β βββ src/
β β βββ components/ # UI components
β β βββ hooks/ # Custom React hooks
β β βββ services/ # API client
β β βββ types/ # TypeScript definitions
β β βββ test/ # Frontend tests
β βββ tests/e2e/ # Playwright E2E tests
β βββ playwright.config.ts
βββ backend/ # Python FastAPI
β βββ app/
β β βββ api/v1/ # API endpoints
β β βββ services/ # Business logic
β β βββ models/ # Database models
β β βββ utils/ # Utilities
β β βββ main.py
β βββ tests/
β β βββ unit/ # Unit tests
β β βββ integration/ # Integration tests
β βββ requirements.txt
βββ README.md
- Python 3.9+
- Node.js 18+
- Redis Server
- Google Cloud Project with Vision API and Translate API enabled
- Google Cloud Service Account with appropriate permissions
git clone <repository-url>
cd korean-subtitle-extractorcd backend
# Create virtual environment
python -m venv venv
# Activate virtual environment
# Windows:
venv/Scripts/activate
# macOS/Linux:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Copy environment template
cp .env.example .envcd frontend
# Install dependencies
npm install
# Copy environment template
cp .env.example .env- Create a Google Cloud Project
- Enable the following APIs:
- Cloud Vision API
- Cloud Translation API
- Create a Service Account
- Download the service account JSON key
- Copy the entire JSON content and paste it as a string in your backend
.envfile:GOOGLE_APPLICATION_CREDENTIALS={"type":"service_account","project_id":"...","private_key":"...","client_email":"..."}
# Google Cloud (JSON string containing service account credentials)
GOOGLE_APPLICATION_CREDENTIALS={"type":"service_account","project_id":"your-project-id","private_key_id":"","private_key":"-----BEGIN PRIVATE KEY-----\nYOUR_PRIVATE_KEY_HERE\n-----END PRIVATE KEY-----\n","client_email":"your-service-account@your-project.iam.gserviceaccount.com","client_id":"","auth_uri":"https://accounts.google.com/o/oauth2/auth","token_uri":"https://oauth2.googleapis.com/token","auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs","client_x509_cert_url":"","universe_domain":"googleapis.com"}
GOOGLE_CLOUD_PROJECT=your-project-id
# Database
DATABASE_URL=sqlite:///./app.db
# Redis
REDIS_URL=redis://localhost:6379
# API Settings
MAX_VIDEO_DURATION=1200 # 20 minutes
CORS_ORIGINS="default ports of frontend app"
# Processing Settings
FRAME_CROP_RATIO=0.1
OCR_BATCH_SIZE=10
# Analytics (Optional)
GA4_MEASUREMENT_ID=G-XXXXXXXXXX
CLARITY_PROJECT_ID=your-clarity-id# API Configuration
VITE_API_URL=http://localhost:8000
VITE_WS_URL=ws://localhost:8000
# Analytics (Optional)
VITE_GA4_MEASUREMENT_ID=G-XXXXXXXXXX
VITE_CLARITY_PRO JECT_ID=your-clarity-idcd backend
# Run database migrations (if using Alembic)
alembic upgrade head# Windows (if installed via Chocolatey)
redis-server
# macOS (if installed via Homebrew)
brew services start redis
# Linux
sudo systemctl start rediscd backend
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000cd frontend
npm run devThe application will be available at:
- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
cd backend
# Run all tests
pytest
# Run with coverage
pytest --cov=app --cov-report=html
# Run specific test categories
pytest tests/unit/ # Unit tests only
pytest tests/integration/ # Integration tests onlycd frontend
# Run unit/component tests
npm run test
# Run tests in watch mode
npm run test:watch
# Run tests with coverage
npm run test:coveragecd frontend
# Install Playwright browsers (first time only)
npx playwright install
# Run E2E tests
npm run test:e2e
# Run E2E tests with UI
npm run test:e2e:uicd backend
# Linting
pylint app/
# Formatting
black app/
isort app/
# Type checking
mypy app/cd frontend
# Linting
npm run lint
# Fix linting issues
npm run lint:fix
# Type checking
npm run type-check- Video Download: yt-dlp extracts video info and validates duration (β€20min)
- Frame Extraction: OpenCV extracts 1 frame per second with timestamps
- Frame Cropping: Remove 10% from sides to eliminate channel logos
- OCR Processing: Google Cloud Vision API batch processes frames (10 frames/request)
- Text Deduplication: Remove duplicate text using 90% similarity threshold
- Bulk Translation: Translate all Korean texts together for better context
- SRT Generation: Create properly formatted subtitle files
- Enter YouTube URL: Paste a YouTube video URL (max 20 minutes)
- Choose Translation: Toggle English translation on/off
- Start Processing: Click "Process Video" to begin
- Monitor Progress: Watch real-time progress updates
- Download SRT Files: Download Korean and/or English subtitle files
Once the backend is running, visit:
- Interactive API Docs: http://localhost:8000/docs
- ReDoc Documentation: http://localhost:8000/redoc
POST /api/v1/process- Start video processingGET /api/v1/status/{job_id}- Get job statusGET /api/v1/download/{job_id}/{language}- Download SRT fileWebSocket /ws/progress/{job_id}- Real-time progress updates
ImportError: No module named 'app'
# Make sure you're in the backend directory and virtual environment is activated
cd backend
source venv/bin/activate # or venv/Scripts/activate on WindowsGoogle Cloud Authentication Error
# Verify credentials file exists and path is correct
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
# Test authentication
gcloud auth application-default loginRedis Connection Error
# Check if Redis is running
redis-cli ping
# Start Redis if not running
redis-serverPort 5173 already in use
# Use different port
npm run dev -- --port 3000TypeScript Errors
# Run type checking
npm run type-check
# Update types
npm run type-check --watch- 1 frame per second extraction rate for accuracy
- 20-minute maximum video duration for MVP
- Frame cropping to remove YouTube channel logos
- Order preservation for OCR results (handle empty frames)
- Bulk translation for better context
- Complete data relationships: frameβtimestampβOCRβtranslation
- Always crop frames before OCR to remove channel logos
- Maintain data relationships using FrameOCRMapping throughout
- Preserve frame order even when OCR returns empty results
- Use bulk translation for better context, then map back to timestamps
- Track analytics events for user behavior insights
The application integrates with:
- Google Analytics 4 for user behavior tracking
- Microsoft Clarity for session recordings and heat maps
- Google Search Console for search performance monitoring
- Input validation for YouTube URLs
- Rate limiting on API endpoints
- Secure file handling and cleanup
- Environment variable protection
- CORS configuration for production
- Fork the repository
- Create a feature branch:
git checkout -b feature/new-feature - Make your changes and add tests
- Run the test suite:
npm testandpytest - Commit your changes:
git commit -m 'Add new feature' - Push to the branch:
git push origin feature/new-feature - Submit a pull request
- Follow TypeScript strict mode
- Write unit tests for new features
- Update integration tests for API changes
- Follow existing code style and patterns
- Update documentation for new features
This project is licensed under the MIT License - see the LICENSE file for details.
If you encounter any issues:
- Check the troubleshooting section above
- Search existing GitHub Issues
- Create a new issue with detailed reproduction steps
- Include logs and error messages
Built with β€οΈ for the Korean language learning community