A modern desktop application for downloading videos, transcribing with GPU-accelerated AI, and analyzing content with local LLMs. Process URLs or local files with ease.
| Download & Transcribe | AI Analysis |
|---|---|
![]() |
![]() |
Modern tabbed interface with real-time progress tracking and intuitive controls
- 1000+ Supported Sites: YouTube, TikTok, Twitter, Vimeo, Reddit, Twitch, and more
- Local File Support: Browse and process local video files directly
- Mixed Input Processing: Process URLs and local files simultaneously
- Smart File Management: Copy files to organized assets/ folder or process in-place
- Whisper AI Integration: State-of-the-art speech recognition
- CUDA Support: Leverage your NVIDIA GPU for fast transcription
- Multiple Model Sizes: From tiny.en (fast) to large-v3 (accurate)
- Real-Time Progress: Visual progress tracking with time-based estimation
- Smart Filenames: Auto-generated transcript names with duplicate handling
- Privacy-First: All analysis runs locally using Ollama
- Comprehensive Analysis: Summaries, quotes, topics, and sentiment
- Custom Prompts: Define your own analysis queries
- Multiple Models: Support for llama3.2, mistral, codellama, and more
- Dark Theme: Beautiful, eye-friendly interface
- Intuitive Workflow: Tabbed navigation (Download β Analysis β Results)
- Real-Time Feedback: Progress bars, status indicators, and validation
- Clickable Paths: Quick access to output folders
- Copy to Clipboard: One-click copying of transcripts and paths
- Processing Queue: Visual queue display for batch operations
- Centralized Assets: All content organized in
assets/folder - Smart Organization: Separate folders for videos, transcripts, and analysis
- Easy Access: Click folder paths to open in file explorer
- Python 3.11+ (3.12 recommended)
- NVIDIA GPU with CUDA support (recommended for Whisper)
- Ollama installed and running (for AI analysis)
- Windows 10/11 (tested, may work on Linux/Mac)
-
Clone or download the repository
git clone https://github.com/yourusername/TranscriptAI.git cd TranscriptAI -
Set up virtual environment and install dependencies
# Create virtual environment uv venv # Activate virtual environment (Windows) .venv\Scripts\activate # Install dependencies uv pip install -r requirements.txt
-
Install CUDA-enabled PyTorch (for GPU acceleration)
uv pip install torch==2.4.1+cu121 --index-url https://download.pytorch.org/whl/cu121
-
Set up Ollama (for AI analysis)
# Download from https://ollama.ai # Pull a model (e.g., llama3.2) ollama pull llama3.2
-
Launch the application
python run.py
Or use the Windows launcher:
TranscriptAI.bat
-
Enter Video URL or Select Local Files
- Paste any video URL in the input field, or
- Click "π Browse" to select local video files
- Mix URLs and files:
https://youtube.com/watch?v=abc; C:/videos/file.mp4
-
Configure Settings
- Whisper Model: Choose accuracy vs speed (medium.en recommended)
- Keep Video: Keep downloaded videos after transcription
- Copy Files: Copy local files to assets/ folder (or process in-place)
- Download Only: Skip transcription, just download videos
-
Start Processing
- Click "π Start Download & Transcription"
- Watch real-time progress in the queue and progress bars
- View detailed logs in the process log section
-
Transcript Auto-Loads
- After transcription, automatically navigates to Analysis tab
- Transcript is ready for analysis
-
Select AI Model
- Choose from available Ollama models (llama3.2 recommended)
- Click "π€ Analyze with AI"
-
View Results
- Summary: Concise overview of content
- Quotes: Most memorable and quotable moments
- Topics: Key themes and subjects discussed
- Sentiment: Emotional tone analysis
- Custom: Run your own analysis prompts
-
Copy Transcript
- Click "π Copy Transcript" to copy full text to clipboard
-
Review Analysis
- Navigate through result tabs
- View formatted analysis results
-
Export Options
- Multiple formats: JSON, Markdown, HTML, PDF, TXT
- Copy to clipboard
- Save to file
TranscriptAI/
βββ src/
β βββ config/ # Path configuration
β βββ core/ # Business logic (downloader, transcriber, analyzer)
β βββ ui/ # User interface components
βββ assets/ # Generated content
β βββ videos/ # Downloaded videos
β βββ transcripts/ # Generated transcripts
β βββ analysis/ # Analysis results (future)
βββ requirements.txt # Python dependencies
βββ run.py # Application entry point
- Backend: Python 3.12, AsyncIO for non-blocking operations
- UI Framework: PySide6 (Qt for Python) with custom dark theme
- Video Download: yt-dlp (supports 1000+ sites)
- Transcription: OpenAI Whisper with PyTorch CUDA
- AI Analysis: Ollama for local LLM inference
- Package Management: uv for fast dependency resolution
- Supports any site that yt-dlp supports (YouTube, TikTok, Twitter, Vimeo, etc.)
- Automatic format selection (best quality MP4)
- Progress tracking with download speed and ETA
- Error handling and retry logic
- Browse and select multiple video files
- Support for MP4, AVI, MOV, MKV, WebM, FLV, WMV, M4V, 3GP
- Option to copy files to assets/ or process in-place
- Automatic duplicate handling
- Process URLs and local files in the same batch
- Real-time input validation
- Visual feedback showing URL/file counts
- Sequential processing with queue visualization
- CUDA acceleration for NVIDIA GPUs
- Multiple Whisper model sizes:
tiny.en: Fastest, lower accuracybase.en: Fast, good for short videossmall.en: Balanced speed/accuracymedium.en: Recommended defaultlarge-v3: Best accuracy, slower
- Real-time progress estimation
- Smart transcript filename generation
- Summaries: Concise overviews of content
- Quotes: Extract most memorable moments
- Topics: Identify key themes and subjects
- Sentiment: Analyze emotional tone
- Custom: User-defined analysis prompts
- All processing happens locally (privacy-first)
- Dark theme with teal accents
- Real-time progress indicators
- Color-coded status messages
- Clickable folder paths (open in explorer)
- Copy buttons for quick clipboard access
- Processing queue visualization
- Input validation with visual feedback
Edit src/config/paths.py:
class ProjectPaths:
BASE_DIR = Path.cwd()
ASSETS_DIR = BASE_DIR / "assets" # Change this
VIDEOS_DIR = ASSETS_DIR / "videos"
TRANSCRIPTS_DIR = ASSETS_DIR / "transcripts"Whisper Model: Edit src/ui/download_tab.py
self.model_combo.setCurrentText("large-v3") # Change defaultAI Model: Edit src/ui/analysis_tab.py
self.model_combo.addItems([
"llama3.2", "your-preferred-model" # Add your model first
])Problem: "torch.cuda.is_available() is False"
- Install CUDA-enabled PyTorch:
uv pip install torch==2.4.1+cu121 --index-url https://download.pytorch.org/whl/cu121 - Ensure Python version β€ 3.12
- Verify NVIDIA drivers are up to date
Problem: Progress bar stuck at 0%
- Install ffmpeg (includes ffprobe) for audio duration detection
- Progress uses time-based estimation (2.5x audio duration)
- Check terminal output for actual Whisper progress
Problem: "Model not available"
- Ensure Ollama is running:
ollama serve - Pull required models:
ollama pull llama3.2 - Check firewall isn't blocking localhost
Problem: ModuleNotFoundError
- Activate virtual environment:
.venv\Scripts\activate - Install dependencies:
uv pip install -r requirements.txt - Check Python path matches your setup
- OS: Windows 10/11 (tested), Linux/Mac (may work)
- Python: 3.11+ (3.12 recommended)
- RAM: 8GB+ recommended
- GPU: NVIDIA GPU with CUDA support (recommended)
- Storage: ~2GB for models and dependencies
- Internet: Required for downloads and model fetching
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- yt-dlp for universal video download support
- OpenAI Whisper for speech recognition
- Ollama for local LLM inference
- PySide6 for the UI framework
Built with β€οΈ for content creators, researchers, and anyone who wants to extract maximum value from video content.
Transform videos into insights, one transcript at a time. π

