Native macOS speech-to-text. Runs entirely on your machine — no cloud, no API keys, no data leaves your device.
Powered by NVIDIA Parakeet TDT 0.6B running on Apple Neural Engine via FluidAudio.
- Native macOS app — lightweight Swift menu bar app, no Electron, no Python
- Push-to-talk — hold the Globe (fn) key to record, release to transcribe and paste
- Fast and accurate — Parakeet TDT 0.6B runs on Apple Neural Engine via CoreML
- Types into any app — transcribed text is pasted directly into the active application
- Dynamic waveform — floating pill shows a live audio waveform while recording
- Fully local — everything runs on-device, nothing is sent to the cloud
- macOS 14.0 (Sonoma) or later
- Apple Silicon Mac (M1 or later recommended for Neural Engine)
git clone https://github.com/BradleyFarquharson/Listen.git
cd Listen
chmod +x build.sh
./build.shThis builds Listen.app in the dist/ directory. Copy it to /Applications/:
cp -r dist/Listen.app /Applications/On first launch, Listen will download the Parakeet model (~200MB). This only happens once.
- Launch Listen — a microphone icon appears in the menu bar
- Grant Input Monitoring and Microphone permissions when prompted
- Open any text field (TextEdit, browser, Slack, etc.)
- Hold the Globe (fn) key — a floating waveform pill appears at the bottom of the screen
- Speak naturally
- Release the Globe key — your speech is transcribed and typed into the active app
Listen needs two macOS permissions:
| Permission | Why | Where to grant |
|---|---|---|
| Input Monitoring | Globe (fn) key detection | System Settings > Privacy & Security > Input Monitoring |
| Microphone | Audio capture | System Settings > Privacy & Security > Microphone |
Both are prompted automatically on first use.
Listen/
App/
ListenApp.swift — @main SwiftUI MenuBarExtra entry point
AppState.swift — Central orchestrator
Audio/
AudioCaptureService.swift — AVAudioEngine mic capture (16kHz mono)
VoiceActivityDetector.swift — RMS energy-based speech segmentation
Transcription/
WhisperService.swift — FluidAudio / Parakeet TDT wrapper
Input/
GlobeKeyMonitor.swift — CGEvent tap for Globe (fn) key
HotkeyManager.swift — Push-to-talk / toggle-mute modes
TextInserter.swift — Clipboard + CGEvent Cmd+V paste
UI/
MenuBarView.swift — Menu bar dropdown
RecordingPillView.swift — Dynamic waveform visualization
RecordingPillWindow.swift — Floating NSPanel overlay
SettingsView.swift — Settings form
Config/
AppConfig.swift — @AppStorage-backed settings
Permissions.swift — Permission checks
Utilities/
SoundEffects.swift — Start/stop audio cues
- Globe (fn) key press detected via CGEvent tap (Input Monitoring permission)
- AVAudioEngine captures microphone audio, resampled to 16kHz mono
- Voice activity detection segments speech using RMS energy thresholds
- Each speech segment is transcribed by Parakeet TDT via CoreML on Apple Neural Engine
- Transcribed text is inserted into the active app via clipboard + simulated Cmd+V
| Component | Technology |
|---|---|
| Language | Swift |
| UI | SwiftUI MenuBarExtra + NSPanel |
| STT model | NVIDIA Parakeet TDT 0.6B v3 |
| Inference | CoreML / Apple Neural Engine via FluidAudio |
| Audio | AVAudioEngine |
| Hotkey | CGEvent tap (Input Monitoring) |
| Text insertion | CGEvent keyboard simulation |
| Config | @AppStorage (UserDefaults) |
The NVIDIA Parakeet TDT model is licensed under CC-BY-4.0.