Skip to content

sungurerdim/whiscribe

Repository files navigation

🎙️ Whiscribe (Audio → Text Converter)

Whiscribe is a lightweight, CPU-only speech-to-text app powered by faster-whisper. Run it in your browser with a single Python script — no GPU, no cloud, no account required.


✨ Features

  • Runs entirely on CPU — no GPU, MPS or external services required
  • Lightweight and minimal — single Python script, no infrastructure stack (no Nginx, no databases, no Go services)
  • Model flexibility — supports all standard (major and multilingual) and distilled faster-whisper models
  • Fully configurable — adjust VAD sensitivity, segment limits and beam size
  • Handles large files — supports audio files up to 100 MB
  • Docker-ready — non-root container with health check and XSRF protection
  • MIT licensed — free to use, modify and distribute

📸 Screenshots

screenshot 1 screenshot 2


🆚 How It Compares

Tool CPU-only Browser UI Setup complexity Status
Whiscribe ✅ Streamlit Single Python script ✅ Active
Whishper ✅ Svelte Docker Compose + Go backend ⚠ v3 frozen (v4 in progress)
Whisper-WebUI ⚠ Manual config ✅ Gradio CUDA 12.8 default ✅ Active
Scriberr ✅ React Go + SQLite + Docker ⚠ Inactive since Dec 2024

Whiscribe's niche: the lowest-friction local transcription option — no multi-service stack, no GPU assumption, no database.


🖥️ Requirements

Requirement Notes
Python 3.9+ Tested with 3.9 and 3.12
Packages faster-whisper, streamlit, torch

🚀 Installation

# 1. Clone the repo
git clone https://github.com/sungurerdim/whiscribe.git
cd whiscribe

# 2. Create and activate a virtual environment
python -m venv venv
source venv/bin/activate   # On Windows: venv\Scripts\activate

# 3. Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

# 4. Run the app
streamlit run whiscribe.py

⚙️ Usage

  • Upload an audio file (opus, mp3, wav, flac, m4a, aac, mp4, ogg, webm, mov, 3gp, aiff, aif) up to 100 MB
  • Adjust segment duration, VAD threshold or beam size if needed
  • Click "Transcribe" and wait for a short while
  • View, copy or download the transcript (as .txt)
  • Click Reset to start over with a new file

Default Settings

Setting Default
Model faster-whisper-small
Min. speech 250 ms
Max. speech 30 s
VAD threshold 0.15
Beam size 5
Max file size 100 MB

Environment Variables

Variable Default Description
LOG_LEVEL INFO Log verbosity: DEBUG, INFO, WARNING, ERROR, CRITICAL

Programmatic Usage

The transcription engine can be used independently of the Streamlit UI:

from config import TranscriptionConfig
from transcriber import load_model, transcribe_bytes

# Load a model (cached for 1 hour)
model = load_model("Systran/faster-whisper-small")

# Read audio file
with open("audio.mp3", "rb") as f:
    audio_bytes = f.read()

# Transcribe with custom config
config = TranscriptionConfig(vad_threshold=0.2, beam_size=3)
text, elapsed = transcribe_bytes(audio_bytes, model, config)
print(f"Transcript ({elapsed:.1f}s): {text}")

🐳 Docker

# Build
docker build -t whiscribe .

# Run
docker run -p 8501:8501 whiscribe

📝 License

This project is licensed under the MIT License. See the LICENSE file for details. © 2025 Sungur Zahid Erdim

🤝 Contributing

Bug reports and pull requests are welcome!

Development Setup

# Clone and setup
git clone https://github.com/sungurerdim/whiscribe.git
cd whiscribe
make setup          # Windows
make setup-unix     # Linux/macOS

# Run locally
make run            # Windows
make run-unix       # Linux/macOS

# Lint
make lint           # Windows
make lint-unix      # Linux/macOS

# Test
make test           # Windows
make test-unix      # Linux/macOS

Code Style

  • Python 3.9+ compatible
  • Linted with ruff (config in pyproject.toml)
  • Type annotations on all function signatures

Troubleshooting

Problem Cause Solution
"No speech detected" Audio too quiet or wrong format Lower VAD threshold (e.g. 0.05), verify file plays correctly
Model download fails Network issue or disk full Check internet connection and disk space, or set a custom cache folder
Out of memory Model too large for available RAM Use a smaller model (tiny or base)
Slow transcription Large file + large model on CPU Use a smaller model or reduce beam size
Import error on startup Missing or incompatible dependency Run pip install -r requirements.txt in your virtual environment

About

CPU-only local audio transcription with a browser UI — powered by faster-whisper, runs in your browser with a single Python script

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors