Listen

Native macOS speech-to-text. Runs entirely on your machine — no cloud, no API keys, no data leaves your device.

Powered by NVIDIA Parakeet TDT 0.6B running on Apple Neural Engine via FluidAudio.

Features

Native macOS app — lightweight Swift menu bar app, no Electron, no Python
Push-to-talk — hold the Globe (fn) key to record, release to transcribe and paste
Fast and accurate — Parakeet TDT 0.6B runs on Apple Neural Engine via CoreML
Types into any app — transcribed text is pasted directly into the active application
Dynamic waveform — floating pill shows a live audio waveform while recording
Fully local — everything runs on-device, nothing is sent to the cloud

Requirements

macOS 14.0 (Sonoma) or later
Apple Silicon Mac (M1 or later recommended for Neural Engine)

Install

Build from source

git clone https://github.com/BradleyFarquharson/Listen.git
cd Listen
chmod +x build.sh
./build.sh

This builds Listen.app in the dist/ directory. Copy it to /Applications/:

cp -r dist/Listen.app /Applications/

First launch

On first launch, Listen will download the Parakeet model (~200MB). This only happens once.

Usage

Launch Listen — a microphone icon appears in the menu bar
Grant Input Monitoring and Microphone permissions when prompted
Open any text field (TextEdit, browser, Slack, etc.)
Hold the Globe (fn) key — a floating waveform pill appears at the bottom of the screen
Speak naturally
Release the Globe key — your speech is transcribed and typed into the active app

Permissions

Listen needs two macOS permissions:

Permission	Why	Where to grant
Input Monitoring	Globe (fn) key detection	System Settings > Privacy & Security > Input Monitoring
Microphone	Audio capture	System Settings > Privacy & Security > Microphone

Both are prompted automatically on first use.

Architecture

Listen/
  App/
    ListenApp.swift              — @main SwiftUI MenuBarExtra entry point
    AppState.swift               — Central orchestrator
  Audio/
    AudioCaptureService.swift    — AVAudioEngine mic capture (16kHz mono)
    VoiceActivityDetector.swift  — RMS energy-based speech segmentation
  Transcription/
    WhisperService.swift         — FluidAudio / Parakeet TDT wrapper
  Input/
    GlobeKeyMonitor.swift        — CGEvent tap for Globe (fn) key
    HotkeyManager.swift          — Push-to-talk / toggle-mute modes
    TextInserter.swift           — Clipboard + CGEvent Cmd+V paste
  UI/
    MenuBarView.swift            — Menu bar dropdown
    RecordingPillView.swift      — Dynamic waveform visualization
    RecordingPillWindow.swift    — Floating NSPanel overlay
    SettingsView.swift           — Settings form
  Config/
    AppConfig.swift              — @AppStorage-backed settings
    Permissions.swift            — Permission checks
  Utilities/
    SoundEffects.swift           — Start/stop audio cues

How it works

Globe (fn) key press detected via CGEvent tap (Input Monitoring permission)
AVAudioEngine captures microphone audio, resampled to 16kHz mono
Voice activity detection segments speech using RMS energy thresholds
Each speech segment is transcribed by Parakeet TDT via CoreML on Apple Neural Engine
Transcribed text is inserted into the active app via clipboard + simulated Cmd+V

Tech stack

Component	Technology
Language	Swift
UI	SwiftUI MenuBarExtra + NSPanel
STT model	NVIDIA Parakeet TDT 0.6B v3
Inference	CoreML / Apple Neural Engine via FluidAudio
Audio	AVAudioEngine
Hotkey	CGEvent tap (Input Monitoring)
Text insertion	CGEvent keyboard simulation
Config	@AppStorage (UserDefaults)

License

MIT

The NVIDIA Parakeet TDT model is licensed under CC-BY-4.0.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Listen		Listen
.gitignore		.gitignore
LICENSE		LICENSE
Package.swift		Package.swift
README.md		README.md
build.sh		build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Listen

Features

Requirements

Install

Build from source

First launch

Usage

Permissions

Architecture

How it works

Tech stack

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

BradleyFarquharson/Listen

Folders and files

Latest commit

History

Repository files navigation

Listen

Features

Requirements

Install

Build from source

First launch

Usage

Permissions

Architecture

How it works

Tech stack

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages