Ollama for ASR. Run open-source speech recognition models on your machine or private cloud. Freedom from proprietary APIs, full privacy, no quality compromise.
Pluggable and extensible — Mix and match transcription, alignment, diarization, and PII detection models. Swap components without breaking your pipeline. Completely open source and free.
Drop-in integration — OpenAI and ElevenLabs compatible APIs mean you can point your existing code at Dalston and it just works. Need more power? The native Dalston API unlocks advanced functionality like multi-engine routing, pipeline customization, and detailed engine metadata.
Transcribe audio files or live streams with speaker diarization, word-level timestamps, and GPU acceleration. Run it on your own infrastructure.
# One-command local transcription (M57 zero-config bootstrap)
# - auto-starts local server if missing
# - auto-ensures default model (distil-small)
DALSTON_SECURITY_MODE=none dalston transcribe tests/audio/test_merged.wav --format json{
"text": "Hello, welcome to the meeting...",
"segments": [
{"speaker": "SPEAKER_01", "start": 0.0, "end": 2.5, "text": "Hello, welcome to the meeting."},
{"speaker": "SPEAKER_02", "start": 2.8, "end": 5.1, "text": "Thanks for having me."}
]
}git clone https://github.com/ssarunic/dalston.git
cd dalston
pip install -e ".[gateway,orchestrator,dev]"
pip install -e ./sdk -e ./cli
DALSTON_SECURITY_MODE=none dalston transcribe tests/audio/test_merged.wav --format jsonFor distributed Docker deployments, see the deployment guide.
- Batch & Real-time — File uploads or WebSocket streaming
- Speaker Diarization — Identify who said what
- Word Timestamps — Precise timing for every word
- OpenAI & ElevenLabs Compatible — Drop-in replacement for existing integrations
- Modular Engines — Faster Whisper, WhisperX, Pyannote, and more
- Private by Default — Runs entirely on your infrastructure, no data leaves your environment