Production-ready REST API for audio processing using WhisperX with Temporal workflow orchestration. Features transcription, alignment, diarization, and medical RAG integration with local LLMs via LM Studio.
- Audio Transcription - State-of-the-art speech-to-text with WhisperX
- Speaker Diarization - Multi-speaker identification and segmentation
- Temporal Workflows - Asynchronous job processing with retry logic
- Medical Processing - PHI detection, SOAP notes, entity extraction
- Web Interface - Streamlit UI for live recording and transcription
- Local LLM Integration - LM Studio support for medical AI
# Configure environment
cp .env.example .env
# Edit .env with your HF_TOKEN
# Start all services
docker-compose up -d
# Access services
# API: http://localhost:8000/docs
# Temporal UI: http://localhost:8233
# Streamlit: http://localhost:8501# Install dependencies
uv sync
# Configure environment
cp .env.example .env
# Start full application (FastAPI + Temporal + Streamlit)
make dev| Service | URL | Description |
|---|---|---|
| FastAPI | http://localhost:8000 | REST API with Scalar/Swagger docs |
| Streamlit UI | http://localhost:8501 | Web interface for audio processing |
| Temporal UI | http://localhost:8233 | Workflow monitoring dashboard |
- Live audio recording from browser
- Audio/video file upload
- Real-time transcription display
- Speaker diarization visualization
- Patient workflow management
- Medical processing (PHI, SOAP notes, entities)
POST /speech-to-text- Full processing pipelinePOST /speech-to-text-url- Process from URLGET /tasks/{task_id}- Check workflow status
POST /asr- Transcribe onlyPOST /asr/align- Align transcriptPOST /asr/diarize- Diarize speakersPOST /asr/combine- Combine results
POST /medical/process- Full medical pipelinePOST /medical/soap- Generate SOAP notePOST /medical/entities- Extract medical entitiesPOST /medical/chat- RAG-powered chatbot
GET /admin- Database interface (SqlAdmin)GET /admin/patients- List all patientsGET /admin/database/stats- Database statistics
Audio: .oga, .m4a, .aac, .wav, .amr, .wma, .awb, .mp3, .ogg
Video: .wmv, .mkv, .avi, .mov, .mp4
Standard Models: tiny, base, small, medium, large-v3-turbo
Distilled: distil-large-v3, distil-medium.en, distil-small.en
Specialized: nyrahealth/faster_CrisperWhisper (medical)
# Start services
make dev # Full application (API + Temporal + Streamlit)
make server # FastAPI only
make worker # Temporal + worker
make streamlit # Streamlit UI only
# Stop services
make stop # Stop all processes
# Temporal management
make temporal-fresh # Clean restart Temporal
make check-activities # Monitor running workflows
# Testing
make test # All tests
make unit-test # Unit tests with coverage
make integration-test # Integration tests
make test-coverage # Generate coverage report
# Code quality
make lint # Run linters
make format # Format code- Python: 3.10+
- Temporal CLI: Required for local development (install from GitHub releases)
- HF_TOKEN: Required for model downloads (get from HuggingFace)
# Install LM Studio (https://lmstudio.ai/)
# Download models
# - MedAlpaca-7B or Meditron-7B (generation)
# - nomic-embed-text-v1.5 (embeddings)
# Configure .env
cp .env.example .env
# Start LM Studio server
# Local Server tab → Select model → Start Server
# Start application with Docker
make build
# Or start application with local Python
make dev- PHI detection & anonymization
- Medical entity extraction (diagnoses, medications, procedures)
- SOAP note generation (Subjective, Objective, Assessment, Plan)
- Semantic search with vector embeddings (FAISS)
GPU (RTX 4090/A10): ~15-25s per consultation CPU: ~50-90s per consultation
Client → FastAPI → Temporal → Activities (Transcribe → Align → Diarize)
↓
Patient DB (SQLite)
↓
Medical LLM (LM Studio)
Model download fails
# Verify HF_TOKEN
curl -H "Authorization: Bearer YOUR_TOKEN" https://huggingface.co/api/whoamiTemporal workflows stuck
make temporal-fresh # Clean restartLM Studio not responding
curl http://localhost:1234/v1/models- whisperX - Core library
- ahmetoner/whisper-asr-webservice
- alexgo84/whisperx-server
MIT