A powerful tool for extracting actionable insights from YouTube videos. Transform video content into comprehensive, structured markdown reports with detailed insights, actionable frameworks, and key moments - designed to maximize value and understanding from any video content.
- π§ Full-Context Analysis: Processes entire video transcripts at once for comprehensive understanding and insights
- π Structured Insights: Generates detailed paragraphs (not bullet points) with context, examples, and actionable details
- π― Executive Summaries: 2-3 paragraph summaries capturing core messages and value propositions
- βοΈ Step-by-Step Frameworks: Detailed, actionable methodologies with clear implementation steps
- π Multiple LLM Support: Works with OpenAI (GPT-5, GPT-4o), Anthropic, or local models via Ollama
- π GPT-5 Enhanced: Optimized for GPT-5 with unlimited token processing and JSON error recovery
- π₯οΈ Web Interface: User-friendly Streamlit UI with real-time progress and category organization
- β‘ Intelligent Caching: Caches transcripts and LLM responses for faster processing
- π Whisper Fallback: Local transcription when YouTube transcripts aren't available
- π Rich CLI: Full-featured command-line interface with progress indicators
- π Batch Processing: Process multiple videos efficiently
New to the tool? See QUICKSTART.md for a 5-minute setup guide.
# Clone the repository
git clone https://github.com/yourusername/youtube-extractor-tool.git
cd youtube-extractor-tool
# Option 1: Install with pip (recommended)
pip install -e .
# Option 2: Install with optional Whisper support
pip install -e ".[whisper]"
# Option 3: Install with development dependencies
pip install -e ".[dev,whisper]"
# Option 4: Install from requirements.txt
pip install -r requirements.txt- Copy the example environment file:
cp .env.example .env- Edit
.envand set your API key:
# For OpenAI (GPT-5 provides highest quality analysis)
LLM_MODEL=gpt-5
# LLM_MODEL=gpt-4o-mini # Faster/cheaper alternative
OPENAI_API_KEY=sk-your-openai-key-here
# For Anthropic Claude
# LLM_MODEL=claude-3-5-sonnet-latest
# ANTHROPIC_API_KEY=sk-ant-api-your-key-here
# For local models (Privacy-focused)
# LLM_MODEL=ollama/llama3.1:8b
# Set output directory (files organized by category)
DEFAULT_OUTPUT_DIR=./outputs# Process a single video with category (saves to ./outputs/AI/Agents/)
python -m yt_extractor.cli process "https://www.youtube.com/watch?v=VIDEO_ID" --output-dir ./outputs --category "AI/Agents"
# Start the web UI (user-friendly interface)
source venv/bin/activate && streamlit run web_ui.py
# Process multiple videos
python -m yt_extractor.cli process "https://www.youtube.com/watch?v=ID1" "https://www.youtube.com/watch?v=ID2"
# Process from file (one URL per line)
echo "https://www.youtube.com/watch?v=ID1" > videos.txt
python -m yt_extractor.cli batch videos.txt
# Check configuration
python -m yt_extractor.cli config check
# View video info without processing
python -m yt_extractor.cli info "https://www.youtube.com/watch?v=VIDEO_ID"process [URLs...]- Process one or more YouTube videosbatch FILE- Process multiple videos from a fileinfo URL- Show video information without processing
config check- Validate configuration and show current settingsconfig init- Create a new .env configuration file
cache stats- Show cache statisticscache clear- Clear all cached data
--output-dir, -o- Specify output directory--format- Choose output format (markdown, json)--dry-run- Preview without saving files--verbose, -v- Enable verbose output--concurrent, -c- Set concurrent processing limit
The tool generates comprehensive markdown files with this structure:
---
type: video-notes
source: youtube
url: https://www.youtube.com/watch?v=VIDEO_ID
video_id: VIDEO_ID
title: Video Title
channel: Channel Name
published: 20240315
created: 2024-03-15 10:30
tags: ["tag1", "tag2"]
---
# Video Title
- Channel: Channel Name
- Published: 2024-03-15
- Duration: 25 minutes and 30 seconds
- URL: [Watch here](https://www.youtube.com/watch?v=VIDEO_ID)
## Executive Summary
Comprehensive 2-3 paragraph overview capturing the core message, value proposition, and key themes. This section provides full context understanding by analyzing the complete transcript, identifying overarching concepts, and connecting different parts of the content for maximum insight value.
The second paragraph continues the analysis, highlighting strategic implications, practical applications, and the broader significance of the content within its domain.
## Key Insights
### Major Concept Title
Detailed paragraph explaining the first major insight with full context, specific examples from the video, strategic reasoning, and actionable details. Each insight is structured as a comprehensive analysis (3-5 sentences) that provides genuine value rather than surface-level bullet points. The analysis includes specific strategies, methodologies, and reasoning patterns discussed in the video.
### Another Key Concept
Second structured paragraph about another important insight, including context, examples, and practical applications. These insights are generated through full-transcript analysis, ensuring deep understanding and meaningful connections between concepts.
## Frameworks & Methods
### Framework Name
Description of what the framework does and why it's valuable for the reader.
**Steps:**
1. First step with detailed explanation and context from the video
2. Second step with practical examples and implementation guidance
3. Third step with additional context and best practices
**Reference:** [t=15:30]
## Key Timestamps
Important moments for easy navigation:
- **[t=05:30]** Description of key moment or concept introduction
- **[t=12:45]** Important insight or framework explanation
- **[t=18:20]** Critical implementation detail or example| Variable | Description | Default |
|---|---|---|
LLM_MODEL |
LLM model to use | gpt-4o-mini |
OPENAI_API_KEY |
OpenAI API key | - |
ANTHROPIC_API_KEY |
Anthropic API key | - |
DEFAULT_OUTPUT_DIR |
Output directory | ./notes |
ENABLE_CACHE |
Enable caching | true |
CACHE_DIR |
Cache directory | ./.cache |
REPORT_TZ |
Timezone for timestamps | America/Costa_Rica |
MAX_CONCURRENT_VIDEOS |
Max concurrent video processing | 3 |
| Variable | Description | Default |
|---|---|---|
WHISPER_MODEL |
Whisper model size | base |
WHISPER_DEVICE |
Device (auto/cuda/cpu) | auto |
WHISPER_COMPUTE_TYPE |
Compute precision | float16 |
The tool automatically caches:
- Transcripts: Avoid re-downloading YouTube transcripts (7 days TTL)
- LLM Responses: Skip re-processing identical full transcripts (30 days TTL)
Cache management:
# View cache statistics
python -m yt_extractor.cli cache stats
# Clear cache
python -m yt_extractor.cli cache clearFor videos without YouTube transcripts, install Whisper support:
# Install optional Whisper dependencies
pip install faster-whisper soundfile
# Requires ffmpeg
# macOS: brew install ffmpeg
# Ubuntu: sudo apt install ffmpegCreate a file with one YouTube URL per line:
https://www.youtube.com/watch?v=VIDEO_ID_1
https://www.youtube.com/watch?v=VIDEO_ID_2
https://www.youtube.com/watch?v=VIDEO_ID_3Then process them all:
python -m yt_extractor.cli batch videos.txt --concurrent 3# Run all tests
pytest
# Run with coverage
pytest --cov=yt_extractor
# Run specific tests
pytest tests/test_models.py -v# Format code
black .
isort .
# Type checking
mypy yt_extractoryt_extractor/
βββ core/
β βββ __init__.py
β βββ config.py # Configuration management
β βββ exceptions.py # Custom exceptions
β βββ extractor.py # Main extractor class
β βββ models.py # Data models
βββ llm/
β βββ __init__.py
β βββ processor.py # LLM interaction
β βββ prompts.py # Prompt templates
βββ utils/
β βββ __init__.py
β βββ cache.py # Caching utilities
β βββ transcript.py # Transcript processing
β βββ formatting.py # Output formatting
β βββ retry.py # Retry mechanisms
βββ cli.py # Command-line interface
This repository includes comprehensive documentation:
- QUICKSTART.md - 5-minute setup guide for new users
- CONFIGURATION.md - Complete configuration reference with examples
- USAGE_EXAMPLES.md - Real-world usage scenarios and best practices
- CLAUDE.md - Architecture guide for development with Claude Code
- TESTING.md - Testing guide and procedures
MIT License - see LICENSE file for details.
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
- Create an issue for bugs or feature requests
- Check existing issues before creating new ones
- yt-dlp for robust YouTube metadata extraction
- youtube-transcript-api for transcript fetching
- LiteLLM for unified LLM API access
- Rich for beautiful terminal output