Skip to content

Extracts hardcoded Korean subs from videos and generates an English SRT file

Notifications You must be signed in to change notification settings

mohithgupta/SRT_Generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Korean Subtitle Extractor - Documentation Hub

Welcome to the comprehensive documentation for the Korean Subtitle Extractor MVP - a micro-SaaS application that extracts hardcoded Korean subtitles from YouTube videos using OCR and optionally translates them to English.

πŸ“š Documentation Overview

This documentation provides everything you need to understand, develop, deploy, and maintain the Korean Subtitle Extractor project.

πŸ“‹ Available Documentation

Document Description Audience
Project Structure Complete project architecture and component breakdown All developers
Development Guide Setup, development workflow, and troubleshooting Developers
API Documentation Complete REST API and WebSocket documentation Frontend developers, API consumers
Deployment Guide Production deployment, security, and monitoring DevOps, System administrators
Error Learnings Complete error documentation (29+ errors, 946 lines) All developers
Error Summary Condensed error patterns and prevention strategies Project leads, Senior developers

πŸš€ Quick Start Links

🎯 Project Summary

πŸš€ Features

  • YouTube Video Processing: Extract subtitles from videos up to 20 minutes
  • Advanced OCR: Google Cloud Vision API for accurate Korean text recognition
  • Smart Cropping: Remove YouTube channel logos before OCR processing
  • Smart Deduplication: Remove duplicate subtitles while preserving timing
  • Bulk Translation: Translate Korean text to English with proper context
  • Optional Translation: Google Cloud Translate for Korean to English conversion
  • SRT Generation: Properly formatted subtitle files for both languages
  • Real-time Progress: WebSocket updates during processing
  • Responsive Design: Works on desktop, tablet, and mobile devices
  • Comprehensive Testing: Unit, integration, and end-to-end test coverage

πŸ—οΈ Architecture

  • Frontend: React 18 + TypeScript + Vite + Tailwind CSS
  • Backend: Python FastAPI + SQLAlchemy + Redis
  • External APIs: Google Cloud Vision (OCR) + Google Cloud Translate
  • Database: SQLite (development) / PostgreSQL (production)
  • Testing: Pytest + Vitest + Playwright
korean-subtitle-extractor/
β”œβ”€β”€ frontend/          # React TypeScript app
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/    # UI components
β”‚   β”‚   β”œβ”€β”€ hooks/         # Custom React hooks
β”‚   β”‚   β”œβ”€β”€ services/      # API client
β”‚   β”‚   β”œβ”€β”€ types/         # TypeScript definitions
β”‚   β”‚   └── test/          # Frontend tests
β”‚   β”œβ”€β”€ tests/e2e/     # Playwright E2E tests
β”‚   └── playwright.config.ts
β”œβ”€β”€ backend/           # Python FastAPI
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ api/v1/        # API endpoints
β”‚   β”‚   β”œβ”€β”€ services/      # Business logic
β”‚   β”‚   β”œβ”€β”€ models/        # Database models
β”‚   β”‚   β”œβ”€β”€ utils/         # Utilities
β”‚   β”‚   └── main.py
β”‚   β”œβ”€β”€ tests/
β”‚   β”‚   β”œβ”€β”€ unit/          # Unit tests
β”‚   β”‚   └── integration/   # Integration tests
β”‚   └── requirements.txt
└── README.md

πŸ“‹ Prerequisites

  • Python 3.9+
  • Node.js 18+
  • Redis Server
  • Google Cloud Project with Vision API and Translate API enabled
  • Google Cloud Service Account with appropriate permissions

πŸ› οΈ Installation & Setup

1. Clone Repository

git clone <repository-url>
cd korean-subtitle-extractor

2. Backend Setup

cd backend

# Create virtual environment
python -m venv venv

# Activate virtual environment
# Windows:
venv/Scripts/activate
# macOS/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Copy environment template
cp .env.example .env

3. Frontend Setup

cd frontend

# Install dependencies
npm install

# Copy environment template
cp .env.example .env

4. Google Cloud Setup

  1. Create a Google Cloud Project
  2. Enable the following APIs:
    • Cloud Vision API
    • Cloud Translation API
  3. Create a Service Account
  4. Download the service account JSON key
  5. Copy the entire JSON content and paste it as a string in your backend .env file:
    GOOGLE_APPLICATION_CREDENTIALS={"type":"service_account","project_id":"...","private_key":"...","client_email":"..."}
    

5. Environment Configuration

Backend (.env)

# Google Cloud (JSON string containing service account credentials)
GOOGLE_APPLICATION_CREDENTIALS={"type":"service_account","project_id":"your-project-id","private_key_id":"","private_key":"-----BEGIN PRIVATE KEY-----\nYOUR_PRIVATE_KEY_HERE\n-----END PRIVATE KEY-----\n","client_email":"your-service-account@your-project.iam.gserviceaccount.com","client_id":"","auth_uri":"https://accounts.google.com/o/oauth2/auth","token_uri":"https://oauth2.googleapis.com/token","auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs","client_x509_cert_url":"","universe_domain":"googleapis.com"}
GOOGLE_CLOUD_PROJECT=your-project-id

# Database
DATABASE_URL=sqlite:///./app.db

# Redis
REDIS_URL=redis://localhost:6379

# API Settings
MAX_VIDEO_DURATION=1200  # 20 minutes
CORS_ORIGINS="default ports of frontend app"

# Processing Settings
FRAME_CROP_RATIO=0.1
OCR_BATCH_SIZE=10

# Analytics (Optional)
GA4_MEASUREMENT_ID=G-XXXXXXXXXX
CLARITY_PROJECT_ID=your-clarity-id

Frontend (.env)

# API Configuration
VITE_API_URL=http://localhost:8000
VITE_WS_URL=ws://localhost:8000

# Analytics (Optional)
VITE_GA4_MEASUREMENT_ID=G-XXXXXXXXXX
VITE_CLARITY_PRO JECT_ID=your-clarity-id

6. Database Setup

cd backend

# Run database migrations (if using Alembic)
alembic upgrade head

7. Start Services

Start Redis (if not running as service)

# Windows (if installed via Chocolatey)
redis-server

# macOS (if installed via Homebrew)
brew services start redis

# Linux
sudo systemctl start redis

Start Backend Server

cd backend
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Start Frontend Development Server

cd frontend
npm run dev

The application will be available at:

πŸ§ͺ Testing

Backend Tests

cd backend

# Run all tests
pytest

# Run with coverage
pytest --cov=app --cov-report=html

# Run specific test categories
pytest tests/unit/          # Unit tests only
pytest tests/integration/   # Integration tests only

Frontend Tests

cd frontend

# Run unit/component tests
npm run test

# Run tests in watch mode
npm run test:watch

# Run tests with coverage
npm run test:coverage

End-to-End Tests

cd frontend

# Install Playwright browsers (first time only)
npx playwright install

# Run E2E tests
npm run test:e2e

# Run E2E tests with UI
npm run test:e2e:ui

πŸ“ Code Quality

Backend

cd backend

# Linting
pylint app/

# Formatting
black app/
isort app/

# Type checking
mypy app/

Frontend

cd frontend

# Linting
npm run lint

# Fix linting issues
npm run lint:fix

# Type checking
npm run type-check

πŸ“Š Processing Pipeline

  1. Video Download: yt-dlp extracts video info and validates duration (≀20min)
  2. Frame Extraction: OpenCV extracts 1 frame per second with timestamps
  3. Frame Cropping: Remove 10% from sides to eliminate channel logos
  4. OCR Processing: Google Cloud Vision API batch processes frames (10 frames/request)
  5. Text Deduplication: Remove duplicate text using 90% similarity threshold
  6. Bulk Translation: Translate all Korean texts together for better context
  7. SRT Generation: Create properly formatted subtitle files

πŸ“Š Usage

  1. Enter YouTube URL: Paste a YouTube video URL (max 20 minutes)
  2. Choose Translation: Toggle English translation on/off
  3. Start Processing: Click "Process Video" to begin
  4. Monitor Progress: Watch real-time progress updates
  5. Download SRT Files: Download Korean and/or English subtitle files

πŸ”§ API Documentation

Once the backend is running, visit:

Key Endpoints

  • POST /api/v1/process - Start video processing
  • GET /api/v1/status/{job_id} - Get job status
  • GET /api/v1/download/{job_id}/{language} - Download SRT file
  • WebSocket /ws/progress/{job_id} - Real-time progress updates

πŸ› Troubleshooting

Common Issues

Backend Issues

ImportError: No module named 'app'

# Make sure you're in the backend directory and virtual environment is activated
cd backend
source venv/bin/activate  # or venv/Scripts/activate on Windows

Google Cloud Authentication Error

# Verify credentials file exists and path is correct
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json

# Test authentication
gcloud auth application-default login

Redis Connection Error

# Check if Redis is running
redis-cli ping

# Start Redis if not running
redis-server

Frontend Issues

Port 5173 already in use

# Use different port
npm run dev -- --port 3000

TypeScript Errors

# Run type checking
npm run type-check

# Update types
npm run type-check --watch

πŸ“‹ Critical Requirements

  • 1 frame per second extraction rate for accuracy
  • 20-minute maximum video duration for MVP
  • Frame cropping to remove YouTube channel logos
  • Order preservation for OCR results (handle empty frames)
  • Bulk translation for better context
  • Complete data relationships: frameβ†’timestampβ†’OCRβ†’translation

🚨 Important Notes

  1. Always crop frames before OCR to remove channel logos
  2. Maintain data relationships using FrameOCRMapping throughout
  3. Preserve frame order even when OCR returns empty results
  4. Use bulk translation for better context, then map back to timestamps
  5. Track analytics events for user behavior insights

πŸ“ˆ Analytics & SEO

The application integrates with:

  • Google Analytics 4 for user behavior tracking
  • Microsoft Clarity for session recordings and heat maps
  • Google Search Console for search performance monitoring

πŸ›‘οΈ Security Considerations

  • Input validation for YouTube URLs
  • Rate limiting on API endpoints
  • Secure file handling and cleanup
  • Environment variable protection
  • CORS configuration for production

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/new-feature
  3. Make your changes and add tests
  4. Run the test suite: npm test and pytest
  5. Commit your changes: git commit -m 'Add new feature'
  6. Push to the branch: git push origin feature/new-feature
  7. Submit a pull request

Development Guidelines

  • Follow TypeScript strict mode
  • Write unit tests for new features
  • Update integration tests for API changes
  • Follow existing code style and patterns
  • Update documentation for new features

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™‹ Support

If you encounter any issues:

  1. Check the troubleshooting section above
  2. Search existing GitHub Issues
  3. Create a new issue with detailed reproduction steps
  4. Include logs and error messages

Built with ❀️ for the Korean language learning community

About

Extracts hardcoded Korean subs from videos and generates an English SRT file

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published