A lightweight web application that receives a live audio stream from a single speaker, converts speech to text in real time, translates that text to a target language using an LLM, and broadcasts the translated transcript to many viewers with minimal latency.
- Live audio capture and streaming (browser-based, no PyAudio required)
- Real-time speech-to-text using Deepgram
- Translation using OpenAI GPT-4o
- Broadcasting to multiple viewers
- Low latency (<3s end-to-end)
- Scalable to thousands of viewers
- The broadcaster page captures microphone audio using the browser
MediaRecorderAPI and encodes it asaudio/webm;codecs=opusat 16 kHz mono. - Encoded audio chunks are sent over a WebSocket to the server at
/ws/stream/{room_id}. - The server forwards the binary audio frames directly to Deepgram Live Transcription.
- As transcripts arrive from Deepgram, the server translates them with OpenAI and broadcasts caption messages to all viewers connected to
/ws/view/{room_id}.
Notes:
- Audio capture is entirely in the browser. There is no server-side microphone capture and no need for PyAudio.
- A small HTTPS middleware adapts links for Cloud Run so assets load over HTTPS when deployed.
- Clone this repository
- Create a
.envfile with your API keys (see.env.example) - Install dependencies:
pip install -r requirements.txt - Run the application:
uvicorn app.main:app --reload
Create a .env file in the root directory with the following variables:
DEEPGRAM_API_KEY=your_deepgram_api_key
OPENAI_API_KEY=your_openai_api_key
REDIS_URL=redis://localhost:6379
Notes:
- Redis is not required for the MVP; state is stored in-memory. The
REDIS_URLis provided for future/production use. PORTis honored automatically on Cloud Run. You can also setHOST,PORT, andDEBUGif needed.- Default translation is Korean -> English; adjust language defaults in
app/config.py.
unbabel/
├── app/
│ ├── __init__.py
│ ├── main.py # FastAPI application
│ ├── config.py # Configuration settings
│ ├── middleware.py # HTTPS adjustments for Cloud Run
│ ├── routes/ # API routes
│ │ ├── __init__.py
│ │ ├── broadcast.py # Broadcaster WebSocket: /ws/stream/{room_id}
│ │ ├── viewer.py # Viewer WebSocket: /ws/view/{room_id}
│ │ └── pages.py # Web page routes
│ ├── services/ # Business logic
│ │ ├── __init__.py
│ │ ├── stt.py # Speech-to-text (Deepgram Live)
│ │ ├── translation.py # Translation (OpenAI GPT-4o)
│ │ └── broadcast.py # Broadcasting helper
│ ├── models/
│ │ ├── __init__.py
│ │ └── messages.py # Message schemas
│ └── utils/
│ ├── __init__.py
│ ├── websocket.py # WebSocket helpers
│ └── state.py # In-memory room state (MVP)
├── static/
│ ├── css/
│ │ └── styles.css
│ └── js/
│ ├── broadcaster.js
│ └── viewer.js
├── templates/
│ ├── base.html
│ ├── base_with_protocol.html
│ ├── index.html
│ ├── broadcast.html
│ └── view.html
├── .env # Environment variables (not in repo)
├── .env.example # Example environment variables
├── requirements.txt # Python dependencies
└── README.md # Project documentation
- Start the app and open the homepage.
- Click "Start Broadcast" to create a new room and open the broadcaster page.
- On the broadcaster page, click "Start" to grant mic permission and begin streaming.
- Share the viewer link on that page; viewers see original and translated captions live.
- Broadcaster audio stream:
ws://<host>/ws/stream/{room_id} - Viewer captions:
ws://<host>/ws/view/{room_id}
This application can be easily deployed to Google Cloud Run using the provided deployment script.
- Google Cloud SDK installed
- Docker installed
- A Google Cloud Platform account with billing enabled
- A Google Cloud project created
-
Make the deployment script executable:
chmod +x deploy-to-cloud-run.sh -
Edit the script to set your Google Cloud project ID:
# Configuration PROJECT_ID="your-gcp-project-id" # Replace with your GCP project ID
-
Create your
.env.yamlfile from the template:cp .env.yaml.template .env.yaml -
Edit the
.env.yamlfile to add your API keys and configuration. -
Run the deployment script:
./deploy-to-cloud-run.sh -
The script will build and deploy your application to Google Cloud Run and provide you with a URL when complete.
The deployment script includes default scaling settings that you can modify:
MIN_INSTANCES: Minimum number of instances (default: 1)MAX_INSTANCES: Maximum number of instances (default: 10)MEMORY: Memory allocation per instance (default: 512Mi)CPU: CPU allocation per instance (default: 1)CONCURRENCY: Maximum concurrent requests per instance (default: 80)
Edit these values in the script based on your expected traffic and requirements.
MIT