Hold-to-talk voice transcription with Gemini Pro aggregation. Supports two STT engines:
- Google Chirp 3 — English transcription via Google Cloud Speech-to-Text
- Sarvam AI — Indian language transcription (Hindi, Tamil, Telugu, etc.) with auto language detection
Hold Left Shift + Left Ctrl → Audio chunked every 5s → STT (parallel)
Release Left Shift + Left Ctrl → Gemini Pro aggregates → Final text output
- Python 3.10+
- Gemini API key
For Chirp 3 mode:
- Google Cloud project with Speech-to-Text API enabled
- gcloud CLI installed and authenticated
For Sarvam mode:
- Sarvam AI API key (from sarvam.ai)
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtgcloud auth login
gcloud config set project YOUR_PROJECT_ID
gcloud services enable speech.googleapis.com
export GOOGLE_CLOUD_PROJECT="your-project-id"
export GEMINI_API_KEY="your-gemini-api-key"
# Optional (defaults shown)
export CHIRP_REGION="us-central1"
export CHIRP_RECOGNIZER="_"export SARVAM_API_KEY="your-sarvam-api-key"
export GEMINI_API_KEY="your-gemini-api-key"
# Optional — defaults to "unknown" (auto-detect)
export SARVAM_LANGUAGE_CODE="hi-IN"Via router (edit STT_ENGINE in run.py to switch between "chirp" and "sarvam"):
python run.pyDirectly:
python main.py # Chirp 3
python sarvam_main.py # Sarvam AI- Hold Left Shift + Left Ctrl to start recording
- Speak - audio is chunked every 5 seconds and sent to Chirp 3
- Release Left Shift + Left Ctrl to stop and get aggregated result
- Paste (Cmd+V) - The final transcribed text is automatically copied to your clipboard. You can paste it anywhere using
Cmd + V. - Press ESC to exit
Grant accessibility permission when prompted: System Preferences → Privacy & Security → Accessibility → Enable for Terminal/IDE
If cursor typing seems flaky, the app now pastes the final text using Cmd+V on macOS,
so ensure your target window accepts paste operations and that pbcopy is available
(it is installed by default). Set ECHOFLOW_MAC_PASTE=0 to force key-by-key typing
instead of the automatic paste on macOS.
May need to add user to input group:
sudo usermod -aG input $USER
# Log out and back inSingle-file per engine. run.py routes between them. No tests, no build step.
- Audio capture:
sounddevice.InputStreamcallback fills buffer - Chunking: Timer thread drains buffer every 5s
- STT:
ThreadPoolExecutor(max_workers=5)sends chunks in parallel - Aggregation: Gemini aggregates all chunk transcripts into clean text
- Output: Copies to clipboard and pastes via Cmd+V on macOS