dict-ai-te lets you record voice notes, transcribe them (via the OpenAI API), and optionally translate to a target language. Use it in two ways:
- A native desktop app with GTK 4 (Linux/macOS)
- A browser-based Web UI powered by Flask
- A Rust version (can run natively on Linux, MacOS and Windows and on Web with Wasm/WebGL)
Both experiences share the same engine and settings.
| Python GTK (Ubuntu) | Python GTK (macOS) |
|---|---|
![]() |
![]() |
| Python Web (Flask) | Rust (egui; shown on Ubuntu) |
|---|---|
![]() |
![]() |
- Record audio notes directly from your microphone (desktop and web).
- See real-time status and elapsed recording time.
- Real-time audio level bar during recording.
- Automatic transcription using the OpenAI Whisper API.
- Edit or correct transcribed text in the main window.
- Optional translation to a selected language.
- Save transcripts as plain text files.
- Copy transcripts to your clipboard with a single click.
- Simple configuration using
.envor environment variables for the OpenAI API key. - Consistent settings across the GTK app and Web UI.
Prerequisites:
- Python 3.8 or higher (Python 3.12+ recommended)
- Linux, macOS, or Windows
- uv for fast dependency management
git clone https://github.com/soyrochus/dict-ai-te.git
cd dict-ai-teuv venv .venv
source .venv/bin/activateMake sure you have a pyproject.toml file in the project root (define dependencies as needed).
uv syncNote: On Linux, the GTK4 library which the application uses, requires installation of various packages:
sudo apt update
sudo apt install -y \
libgtk-4-dev \
libgirepository-2.0-dev \
libcairo2-dev \
pkg-config \
python3-dev \
python3-gi \
python3-gi-cairo \
gir1.2-gtk-4.0 \
libportaudio2On macOS, you need to install the dependencies using Homebrew:
brew install gtk4 pygobject3 portaudiopython -m dictaiteor use the script in the bin directory
<source dir>>bin/dictaiteNote that the script needs to have the executable permission set.
You can also run the Web UI in your browser (see the next section for first-time setup):
bin/dictaite-webuv sync --extra ui-web
bin/dictaite-webVisit http://localhost:5000 to use the browser interface.
Quick reference:
- GTK desktop app:
bin/dictaite - Web UI (Flask):
bin/dictaite-web
There’s also a native Rust version of dict-ai-te. It lives in the Rust crate at the project root (see Cargo.toml) with sources under src/. It’s built with eframe/egui (rendered via wgpu) and uses:
cpal/rodiofor audio input/outputrfdfor native file dialogsarboardfor clipboardreqwestfor OpenAI HTTP calls (API key from environment or.env)
You can run it on Linux, macOS, and Windows. On launch it reads OPENAI_API_KEY from your environment or a local .env file.
Before building, install the development libraries this project depends on:
sudo apt update
sudo apt install -y \
build-essential pkg-config \
libssl-dev \
libasound2-dev libjack-jackd2-dev \
libx11-dev libxi-dev libxcb1-dev libxcb-render0-dev libxcb-shape0-dev libxcb-xfixes0-dev \
libxkbcommon-dev libwayland-dev libgl1-mesa-dev libudev-dev \
libvulkan1 vulkan-tools mesa-vulkan-drivers libvulkan-dev \
libgtk-3-dev \
xclip wl-clipboard- build-essential, pkg-config — compilers, linker, and pkg-config metadata.
- libssl-dev — required by
openssl-sys(viareqwest). - libasound2-dev, libjack-jackd2-dev — required by
alsa-sysand JACK backend incpal/rodio. - X11/Wayland/GL stack (libx11-dev … libgl1-mesa-dev …) — required by
eframe/eguiwithwgpu. - libudev-dev — device discovery for
wgpu. - libvulkan1, mesa-vulkan-drivers, libvulkan-dev — Vulkan runtime and headers for
wgpu. - libgtk-3-dev — needed by
rfdfor native file dialogs. - xclip, wl-clipboard — runtime helpers for
arboardclipboard integration.
Make sure you have the Rust toolchain installed (via https://rustup.rs/). Then:
cargo build --releaseRun it either via cargo:
cargo run --release…or by launching the built binary directly:
./target/release/dict_ai_teEnvironment setup (any of the app variants):
export OPENAI_API_KEY=your_key_here
# or create a .env file in the project root with OPENAI_API_KEY=...The Flask interface mirrors the GTK layout using TailwindCSS and vanilla JavaScript. It supports recording, live level metering, Whisper transcription, optional translation, text-to-speech previews, download/copy helpers and keyboard shortcuts.
- Python 3.12+
ffmpeg(required bypydubto transcode browser recordings to WAV)- OpenAI API key exported as
OPENAI_API_KEYor placed in a.env
All dependecies are installed with the desktop application.
Use the convenience script or run the module directly:
bin/dictaite-web
# or
uv run -m dictaite.ui_web.app --host 0.0.0.0 --port 8080Navigate to http://localhost:8080. The browser will prompt for microphone permissions when you start recording. MediaRecorder
produces webm/ogg blobs which are transcoded to 16 kHz WAV on the server before reaching Whisper.
POST /api/transcribe– multipart upload withaudio, optionallanguage,translate,target_lang. Returns JSON withtext,translatedText?,durationMs.POST /api/tts-test– JSON{ gender, text, voice? }, returnsaudio/wavpreview bytes.POST /api/settings– JSON payload to persist shared settings;GET /api/settingsfetches current values.GET /api/health– simple readiness probe.
CORS is disabled by default. Enable it via DICTAITE_ENABLE_CORS=true and DICTAITE_CORS_ORIGIN=... in the Flask configuration
if embedding in another domain. A placeholder hook (DICTAITE_RATE_LIMITER) is left in app.py to plug in your preferred rate
limiter middleware.
Space– start/stop recording (ignored when the textarea is focused).Ctrl/Cmd+C– copy transcript.Ctrl/Cmd+S– download transcript as.txt.
The Play button synthesizes audio for the current transcript via /api/tts-test using the chosen voice gender.
Settings are stored in ~/.dictaite/settings.json and shared with the GTK application. The web form uses the same voices and
language lists defined in dictaite/ui_common.py. Use the Play buttons to preview voice choices before saving.
-
Create a
.envfile in the project root containing your OpenAI API key:OPENAI_API_KEY=your_key_here
-
Alternatively, set the
OPENAI_API_KEYenvironment variable:export OPENAI_API_KEY=your_key_here
Everyone is invited and welcome to contribute: open issues, propose pull requests, share ideas, or help improve documentation.
Participation is open to all, regardless of background or viewpoint.
This project follows the FOSS Pluralism Manifesto, which affirms respect for people, freedom to critique ideas, and space for diverse perspectives.
Copyright (c) 2025, Iwan van der Kleijn
This project is licensed under the MIT License. See the LICENSE file for details.



