Telegram bot that converts voice messages into text using local Whisper and ffmpeg. Supports a multilingual interface (EN / RU / UK).
- 🎙️ Voice message transcription
- 🌍 Multilingual interface (English / Русский / Українська)
- 🗣️ Language selection via /start and /language commands
- 🧠 Local Whisper model (no external APIs)
- ☁️ Optional Deepgram cloud transcription backend
- 🔊 Audio conversion via ffmpeg (OGG/MP3/MP4 → WAV 16 kHz)
- ⚙️ Configurable via environment variables
- 📝 Structured logging
- Python 3.10+
- ffmpeg
- Telegram Bot Token
git clone https://github.com/machinatororis/telegram-voice2text-bot.git
cd telegram-voice2text-botpython -m venv .venv
source .venv/bin/activate # Linux / macOS
or
.venv\Scripts\activate # Windows
pip install -r requirements.txt
Create a .env file in the project root (it is ignored by git).
You can use .env.example as a starting point.
BubbleVoice supports multiple speech-to-text backends. The active backend is selected via environment variables.
By default, the bot uses local Whisper for transcription:
TRANSCRIBER_BACKEND=whisperThis mode:
- Runs fully locally.
- Does not require external APIs.
- Requires ffmpeg and Whisper to be available on the system.
Deepgram can be used as an alternative cloud-based transcription backend.
Requirements:
- A Deepgram account.
- A valid Deepgram API key.
Environment variables:
TRANSCRIBER_BACKEND=deepgram
DG_API_KEY=your_deepgram_api_keyBOT_TOKEN=your_telegram_bot_token
LOG_LEVEL=INFO
By default the app tries to run ffmpeg from the system PATH.
If you deploy to a server or use a custom ffmpeg installation, you can manually specify the path to the executable:
FFMPEG_PATH=/usr/local/bin/ffmpeg
FFMPEG_PATH=C:\ffmpeg\bin\ffmpeg.exe
If FFMPEG_PATH is set but invalid, the app will fall back to searching ffmpeg in PATH.
python main.py
The main.py file is the application entry point and initializes
logging, configuration, and bot handlers.
This mode uses Telegram long polling and is intended for local development and debugging.
For cloud deployments, BubbleVoice can run in webhook mode using FastAPI. In this mode, Telegram sends updates to the bot via HTTP POST requests instead of long polling.
Run the FastAPI application using uvicorn:
uvicorn webapp:app --host 0.0.0.0 --port 8000The server exposes the following endpoints:
POST /webhook— receives Telegram updates sent by the Telegram APIGET /health— health check endpoint for cloud platforms
This mode is recommended for:
- cloud deployments
- containerized environments (Docker, PaaS)
- platforms with limited CPU resources where long polling is inefficient
For additional security, it is recommended to use a secret webhook path. This helps prevent random HTTP requests from reaching the webhook endpoint.
Example:
WEBHOOK_SECRET=my-super-secret-tokenWebhook URL example:
https://your-domain.com/webhook/my-super-secret-token
In this case, Telegram will only send updates to the correct secret URL.
-
.envis intentionally excluded from git. -
Use
.env.examplefor reference. -
For server deployments,
FFMPEG_PATHis recommended if ffmpeg is not in PATH. -
Audio is sent to Deepgram as 16 kHz mono WAV.
-
The nova-3-general model is used with automatic language detection.
-
If Deepgram returns an error or times out, the bot logs the error and automatically falls back to local Whisper without crashing.


