Generate professional karaoke videos with instrumental audio and synchronized lyrics. Available as a local CLI (karaoke-gen) or cloud-based CLI (karaoke-gen-remote) that offloads processing to Google Cloud.
Run all processing locally on your machine. Requires GPU for optimal audio separation performance.
karaoke-gen "ABBA" "Waterloo"Offload all processing to a cloud backend. No GPU required - just authenticate and submit jobs.
karaoke-gen-remote ./song.flac "ABBA" "Waterloo"Both CLIs produce identical outputs: 4K karaoke videos, CDG+MP3 packages, audio stems, and more.
- Audio Separation: AI-powered vocal/instrumental separation using MDX and Demucs models
- Lyrics Transcription: Word-level timestamps via AudioShake API
- Lyrics Correction: Match transcription against online lyrics (Genius, Spotify, Musixmatch)
- Human Review: Interactive UI for correcting lyrics before final render
- Video Rendering: High-quality 4K karaoke videos with customizable styles
- Multiple Outputs: MP4 (4K lossless/lossy, 720p), MKV, CDG+MP3, TXT+MP3
- YouTube Upload: Automatic upload to your YouTube channel
- Dropbox Integration: Organize output in brand-coded folders
- Google Drive: Upload to public share folders
- Discord Notifications: Webhook notifications on completion
pip install karaoke-genThis installs both karaoke-gen (local) and karaoke-gen-remote (cloud) CLIs.
- Python 3.10-3.13
- FFmpeg
- For local processing: CUDA-capable GPU or Apple Silicon CPU recommended
Transcription is required for creating karaoke videos with synchronized lyrics. The system needs word-level timing data to display lyrics in sync with the music.
Commercial service with high-quality transcription. Best for production use.
export AUDIOSHAKE_API_TOKEN="your_audioshake_token"Get an API key at https://www.audioshake.ai/ - business only, at time of writing this.
Run Whisper directly on your local machine using whisper-timestamped. Works on CPU, NVIDIA GPU (CUDA), or Apple Silicon.
# Install with local Whisper support
pip install "karaoke-gen[local-whisper]"
# Optional: Configure model size (tiny, base, small, medium, large)
export WHISPER_MODEL_SIZE="medium"
# Optional: Force specific device (cpu, cuda, mps)
export WHISPER_DEVICE="cpu"Model Size Guide:
| Model | VRAM | Speed | Quality |
|---|---|---|---|
| tiny | ~1GB | Fast | Lower |
| base | ~1GB | Fast | Basic |
| small | ~2GB | Medium | Good |
| medium | ~5GB | Slower | Better |
| large | ~10GB | Slowest | Best |
CPU-Only Installation (no GPU required):
# Pre-install CPU-only PyTorch first
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install "karaoke-gen[local-whisper]"Local Whisper runs automatically as a fallback when no cloud transcription services are configured.
Cloud-based alternative using OpenAI's Whisper model on RunPod infrastructure.
export RUNPOD_API_KEY="your_runpod_key"
export WHISPER_RUNPOD_ID="your_whisper_endpoint_id"Set up a Whisper endpoint at https://www.runpod.io/
If you don't need synchronized lyrics, use the --skip-lyrics flag:
karaoke-gen --skip-lyrics "Artist" "Title"This creates an instrumental-only karaoke video without lyrics overlay.
Note: See
lyrics_transcriber_temp/README.mdfor detailed transcription provider configuration options.
# Generate from local audio file
karaoke-gen ./song.mp3 "Artist Name" "Song Title"
# Search and download audio automatically
karaoke-gen "Rick Astley" "Never Gonna Give You Up"
# Process from YouTube URL
karaoke-gen "https://www.youtube.com/watch?v=dQw4w9WgXcQ" "Rick Astley" "Never Gonna Give You Up"Offload just the GPU-intensive audio separation to Modal.com while keeping other processing local:
export AUDIO_SEPARATOR_API_URL="https://USERNAME--audio-separator-api.modal.run"
karaoke-gen "Artist" "Title"# Custom styling
karaoke-gen --style_params_json="./styles.json" "Artist" "Title"
# Generate CDG and TXT packages
karaoke-gen --enable_cdg --enable_txt "Artist" "Title"
# YouTube upload
karaoke-gen --enable_youtube_upload --youtube_description_file="./desc.txt" "Artist" "Title"
# Full production run
karaoke-gen \
--style_params_json="./branding.json" \
--enable_cdg \
--enable_txt \
--brand_prefix="BRAND" \
--enable_youtube_upload \
--youtube_description_file="./description.txt" \
"Artist" "Title"karaoke-gen --helpThe remote CLI submits jobs to a Google Cloud backend that handles all processing. You don't need a GPU or any audio processing libraries installed locally.
-
Set the backend URL:
export KARAOKE_GEN_URL="https://api.nomadkaraoke.com" # Or your own backend
-
Authenticate with Google Cloud:
gcloud auth login
# Submit a job
karaoke-gen-remote ./song.flac "ABBA" "Waterloo"
# The CLI will:
# 1. Upload your audio file
# 2. Monitor processing progress
# 3. Open lyrics review UI when ready
# 4. Prompt for instrumental selection
# 5. Download all outputs when complete# List all jobs
karaoke-gen-remote --list
# Resume monitoring an existing job
karaoke-gen-remote --resume abc12345
# Cancel a running job
karaoke-gen-remote --cancel abc12345
# Delete a job and its files
karaoke-gen-remote --delete abc12345karaoke-gen-remote \
--style_params_json="./karaoke-styles.json" \
--enable_cdg \
--enable_txt \
--brand_prefix=NOMAD \
--enable_youtube_upload \
--youtube_description_file="./youtube-description.txt" \
./song.flac "Artist" "Title"| Variable | Description | Default |
|---|---|---|
KARAOKE_GEN_URL |
Backend service URL | Required |
KARAOKE_GEN_AUTH_TOKEN |
Admin auth token (for protected endpoints) | Optional |
REVIEW_UI_URL |
Lyrics review UI URL | https://gen.nomadkaraoke.com/lyrics/ |
POLL_INTERVAL |
Seconds between status polls | 5 |
Note: The REVIEW_UI_URL defaults to the hosted lyrics review UI. For local development, set it to http://localhost:5173 if you're running the frontend dev server.
The backend uses token-based authentication for admin operations (bulk delete, internal worker triggers). For basic job submission and monitoring, authentication is optional.
For admin access:
export KARAOKE_GEN_AUTH_TOKEN="your-admin-token"The token must match one of the tokens configured in the backend's ADMIN_TOKENS environment variable.
For automated/CI usage:
karaoke-gen-remote -y ./song.flac "Artist" "Title"The -y flag auto-accepts default corrections and selects clean instrumental.
Create a styles.json file to customize the karaoke video appearance:
{
"intro": {
"video_duration": 5,
"background_image": "/path/to/title-background.png",
"font": "/path/to/Font.ttf",
"artist_color": "#ffdf6b",
"title_color": "#ffffff"
},
"karaoke": {
"background_image": "/path/to/karaoke-background.png",
"font_path": "/path/to/Font.ttf"
},
"end": {
"background_image": "/path/to/end-background.png"
},
"cdg": {
"font_path": "/path/to/Font.ttf",
"instrumental_background": "/path/to/cdg-background.png"
}
}When using karaoke-gen-remote, all referenced files are automatically uploaded with your job.
A completed job produces:
BRAND-1234 - Artist - Title/
βββ Artist - Title (Final Karaoke Lossless 4k).mp4 # ProRes 4K
βββ Artist - Title (Final Karaoke Lossless 4k).mkv # FLAC audio 4K
βββ Artist - Title (Final Karaoke Lossy 4k).mp4 # H.264 4K
βββ Artist - Title (Final Karaoke Lossy 720p).mp4 # H.264 720p
βββ Artist - Title (Final Karaoke CDG).zip # CDG+MP3 package
βββ Artist - Title (Final Karaoke TXT).zip # TXT+MP3 package
βββ Artist - Title (Karaoke).cdg # Individual CDG
βββ Artist - Title (Karaoke).mp3 # Karaoke audio
βββ Artist - Title (Karaoke).lrc # LRC lyrics
βββ Artist - Title (Karaoke).ass # ASS subtitles
βββ Artist - Title (Title).mov # Title screen video
βββ Artist - Title (End).mov # End screen video
βββ Artist - Title (Instrumental...).flac # Clean instrumental
βββ Artist - Title (Instrumental +BV...).flac # With backing vocals
βββ stems/ # All audio stems
βββ ...Vocals....flac
βββ ...Bass....flac
βββ ...Drums....flac
βββ ...
The cloud backend runs on Google Cloud Platform using:
- Cloud Run: Serverless API hosting
- Firestore: Job state management
- Cloud Storage: File uploads and outputs
- Modal.com: GPU-accelerated audio separation
- AudioShake: Lyrics transcription API
- Google Cloud account with billing enabled
- Pulumi CLI
- Modal.com account (for audio separation)
- AudioShake API key
cd infrastructure
# Install dependencies
pip install -r requirements.txt
# Login to Pulumi
pulumi login
# Create a stack
pulumi stack init prod
# Configure GCP project
pulumi config set gcp:project your-project-id
pulumi config set gcp:region us-central1
# Deploy infrastructure
pulumi upThis creates:
- Firestore database
- Cloud Storage bucket
- Artifact Registry
- Service account with IAM roles
- Secret Manager secrets (you add values)
# AudioShake API key
echo -n "your-audioshake-key" | gcloud secrets versions add audioshake-api-key --data-file=-
# Genius API key
echo -n "your-genius-key" | gcloud secrets versions add genius-api-key --data-file=-
# Modal API URL
echo -n "https://your-modal-url" | gcloud secrets versions add audio-separator-api-url --data-file=-
# YouTube OAuth credentials (JSON)
gcloud secrets versions add youtube-oauth-credentials --data-file=./youtube-creds.json
# Dropbox OAuth credentials (JSON)
gcloud secrets versions add dropbox-oauth-credentials --data-file=./dropbox-creds.json
# Google Drive service account (JSON)
gcloud secrets versions add gdrive-service-account --data-file=./gdrive-sa.json# Build and deploy
gcloud builds submit --config=cloudbuild.yaml
# Get outputs from Pulumi
SA_EMAIL=$(pulumi stack output service_account_email)
BUCKET_NAME=$(pulumi stack output bucket_name)
# Deploy Cloud Run service
gcloud run deploy karaoke-backend \
--image us-central1-docker.pkg.dev/YOUR-PROJECT/karaoke-repo/karaoke-backend:latest \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--service-account $SA_EMAIL \
--memory 2Gi \
--cpu 2 \
--timeout 600 \
--set-env-vars="GOOGLE_CLOUD_PROJECT=YOUR-PROJECT,GCS_BUCKET_NAME=$BUCKET_NAME"export KARAOKE_GEN_URL="https://your-backend.run.app"
karaoke-gen-remote ./song.flac "Artist" "Title"The backend exposes a REST API for job management.
POST /api/jobs/upload
Submit a new karaoke generation job with audio file and options.
curl -X POST "https://api.example.com/api/jobs/upload" \
-F "file=@song.flac" \
-F "artist=ABBA" \
-F "title=Waterloo" \
-F "enable_cdg=true" \
-F "enable_txt=true" \
-F "brand_prefix=NOMAD" \
-F "style_params=@styles.json" \
-F "style_karaoke_background=@background.png"GET /api/jobs/{job_id}
Get job status and details.
curl "https://api.example.com/api/jobs/abc12345"GET /api/jobs
List all jobs with optional status filter.
curl "https://api.example.com/api/jobs?status=complete&limit=10"POST /api/jobs/{job_id}/cancel
Cancel a running job.
curl -X POST "https://api.example.com/api/jobs/abc12345/cancel" \
-H "Content-Type: application/json" \
-d '{"reason": "User cancelled"}'DELETE /api/jobs/{job_id}
Delete a job and its files.
curl -X DELETE "https://api.example.com/api/jobs/abc12345?delete_files=true"GET /api/review/{job_id}/correction-data
Get correction data for lyrics review.
POST /api/review/{job_id}/complete
Submit corrected lyrics and trigger video rendering.
GET /api/jobs/{job_id}/instrumental-options
Get available instrumental options.
POST /api/jobs/{job_id}/select-instrumental
Submit instrumental selection (clean or with_backing).
curl -X POST "https://api.example.com/api/jobs/abc12345/select-instrumental" \
-H "Content-Type: application/json" \
-d '{"selection": "clean"}'GET /api/jobs/{job_id}/download-urls
Get download URLs for all output files.
GET /api/jobs/{job_id}/download/{category}/{file_key}
Stream download a specific file.
GET /api/health
Check backend health status.
This error occurs during the finalisation step when the (With Vocals).mkv file is missing. This file is created during lyrics transcription.
Most common cause: No transcription provider configured.
Quick fix:
-
Check if transcription providers are configured:
echo $AUDIOSHAKE_API_TOKEN echo $RUNPOD_API_KEY
-
If both are empty, set up a provider (see Transcription Provider Setup)
-
Or use
--skip-lyricsfor instrumental-only karaoke:karaoke-gen --skip-lyrics "Artist" "Title"
Other causes:
- Invalid API credentials - verify your tokens are correct and active
- API service unavailable - check service status pages
- Network connectivity issues - ensure you can reach the API endpoints
- Transcription timeout - try again or use a different provider
If karaoke-gen runs without errors but produces no synchronized lyrics:
-
Check logs - Run with
--log_level debugfor detailed output:karaoke-gen --log_level debug "Artist" "Title"
-
Verify environment variables - Ensure API tokens are exported in your shell:
# Check if set printenv | grep -E "(AUDIOSHAKE|RUNPOD|WHISPER)" # Set in current session export AUDIOSHAKE_API_TOKEN="your_token"
-
Test API connectivity - Verify you can reach the transcription service
This warning means no reference lyrics were fetched from online sources (Genius, Spotify, Musixmatch). The transcription will still work, but auto-correction may be less accurate.
To fix:
- Set
GENIUS_API_TOKENfor Genius lyrics - Set
SPOTIFY_COOKIE_SP_DCfor Spotify lyrics - Set
RAPIDAPI_KEYfor Musixmatch lyrics - Or provide lyrics manually with
--lyrics_file /path/to/lyrics.txt
If the output video has quality problems:
- Ensure FFmpeg is properly installed:
ffmpeg -version - Check available codecs:
ffmpeg -codecs - For 4K output, ensure sufficient disk space (10GB+ per track)
If you get CUDA out of memory errors:
# Use a smaller model
export WHISPER_MODEL_SIZE="small" # or "tiny"
# Or force CPU mode
export WHISPER_DEVICE="cpu"CPU transcription is significantly slower than GPU. For faster processing:
- Use a smaller model (
tinyorbase) - Consider using cloud transcription (AudioShake or RunPod)
- On Apple Silicon, the
smallmodel offers good speed/quality balance
Whisper models are downloaded on first use (~1-3GB depending on size). If downloads fail:
- Check your internet connection
- Set a custom cache directory:
export WHISPER_CACHE_DIR="/path/with/space" - Models are cached in
~/.cache/whisper/by default
If you get "whisper-timestamped is not installed":
pip install "karaoke-gen[local-whisper]"
# Or install directly:
pip install whisper-timestampedIf you want to disable local Whisper (e.g., to force cloud transcription):
export ENABLE_LOCAL_WHISPER="false"# Run all tests
pytest tests/ backend/tests/ -v
# Run only unit tests
pytest tests/unit/ -v
# Run with coverage
pytest tests/unit/ -v --cov=karaoke_gen --cov-report=term-missingkaraoke-gen/
βββ karaoke_gen/ # Core CLI package
β βββ utils/
β β βββ gen_cli.py # Local CLI (karaoke-gen)
β β βββ remote_cli.py # Remote CLI (karaoke-gen-remote)
β βββ karaoke_finalise/ # Video encoding, packaging, distribution
β βββ style_loader.py # Unified style configuration
βββ backend/ # Cloud backend (FastAPI)
β βββ api/routes/ # API endpoints
β βββ workers/ # Background processing workers
β βββ services/ # Business logic services
βββ infrastructure/ # Pulumi IaC for GCP
βββ docs/ # Documentation
βββ tests/ # Test suite
MIT
Contributions are welcome! Please see our contributing guidelines.