VoxScript

Local AI-powered dictation for macOS using WhisperKit

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   🎤 VoxScript - Local Dictation for macOS                     │
│                                                                 │
│   ┌─────────────┐    ┌──────────────┐    ┌─────────────┐       │
│   │   Record    │ -> │  WhisperKit  │ -> │  Insert at  │       │
│   │   Audio     │    │  Transcribe  │    │   Cursor    │       │
│   └─────────────┘    └──────────────┘    └─────────────┘       │
│                             │                                   │
│                             ▼                                   │
│                    ┌──────────────┐                            │
│                    │   Ollama     │                            │
│                    │  (Optional)  │                            │
│                    └──────────────┘                            │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Features

100% Local Processing - All transcription happens on-device using Apple Silicon
Global Hotkeys - Press ⌘⇧Space anywhere to start/stop recording
Multiple Recording Modes - Toggle, Push-to-Talk, or Continuous with silence detection
Optional Post-Processing - Clean up text with local Ollama LLM
Menu Bar App - Runs quietly in the background
Works Everywhere - Text insertion works in standard apps AND terminal emulators

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     VoxScript.app                            │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │                    VoxScriptApp                        │  │
│  │                 (App Entry Point)                      │  │
│  └──────────────────────────────────────────────────────┘  │
│                           │                                  │
│           ┌───────────────┼───────────────┐                 │
│           ▼               ▼               ▼                 │
│  ┌─────────────┐ ┌──────────────┐ ┌─────────────────┐      │
│  │  StatusBar  │ │ FloatingPanel│ │   HotkeyManager │      │
│  │ Controller  │ │  Controller  │ │                 │      │
│  └─────────────┘ └──────────────┘ └─────────────────┘      │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │                    Core Services                       │  │
│  ├──────────────┬───────────────┬──────────────────────┤  │
│  │AudioRecorder │TranscriptionE │    PostProcessor      │  │
│  │ (AVAudioEng) │ (WhisperKit)  │     (Ollama)         │  │
│  └──────────────┴───────────────┴──────────────────────┘  │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │                      Models                            │  │
│  ├─────────────┬─────────────┬─────────────────────────┤  │
│  │  AppState   │  Settings   │  TranscriptionResult    │  │
│  └─────────────┴─────────────┴─────────────────────────┘  │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Requirements

macOS 14.0+ (Sonoma or later)
Apple Silicon (M1/M2/M3/M4)
~1-2GB disk space for Whisper model

Installation

Download Release

Download the latest DMG from the Releases page.

Build from Source

# Clone the repository
git clone https://github.com/davidcv5/VoxScript.git
cd VoxScript

# Open in Xcode
open VoxScript.xcodeproj

# Build and Run (⌘R)

Or build via command line:

xcodebuild -project VoxScript.xcodeproj -scheme VoxScript -configuration Release

Dependencies

Package	Version	Purpose
WhisperKit	0.15.0+	Speech-to-text engine
KeyboardShortcuts	2.0.0+	Global hotkey handling

Optional

Ollama - For post-processing text cleanup (install separately)

Usage

Launch VoxScript - It appears in the menu bar
Press ⌘⇧Space to start recording
Speak your text
Press ⌘⇧Space again to stop and transcribe
Text is automatically inserted at cursor

First Run

On first launch, VoxScript will:

Request Microphone permission
Request Accessibility permission (for global shortcuts)
Download the default Whisper model (~1GB)

Keyboard Shortcuts

Action	Shortcut
Toggle Recording	⌘⇧Space
Cancel Recording	Escape
Open Settings	⌘,

Recording Modes

Mode	Behavior
Toggle	Press to start, press again to stop
Push-to-Talk	Hold key to record, release to transcribe
Continuous	Auto-stops after detecting silence (2s)

Available Models

Model	Size	Speed	Accuracy
large-v3-turbo	~950MB	Fast	Excellent
large-v3	~1.5GB	Slower	Best
small.en	~460MB	Very fast	Good (English)
base	~140MB	Fastest	Basic
tiny	~75MB	Instant	Testing only

Data Flow

┌──────────┐     ┌───────────────┐     ┌──────────────┐
│ Mic/User │────▶│ AudioRecorder │────▶│ Temp WAV File│
└──────────┘     └───────────────┘     └──────────────┘
                                              │
                                              ▼
                       ┌──────────────────────────────────────┐
                       │        TranscriptionEngine           │
                       │  ┌──────────────────────────────┐   │
                       │  │         WhisperKit            │   │
                       │  │  ┌────────┐   ┌───────────┐  │   │
                       │  │  │ Model  │ + │ CoreML/ANE│  │   │
                       │  │  └────────┘   └───────────┘  │   │
                       │  └──────────────────────────────┘   │
                       └──────────────────────────────────────┘
                                              │
                                              ▼
                               ┌──────────────────────┐
                               │  TranscriptionResult │
                               │  { text, language }  │
                               └──────────────────────┘
                                              │
                         ┌────────────────────┼────────────────────┐
                         │                    │                    │
                         ▼                    │                    ▼
              ┌─────────────────┐            │         ┌─────────────────┐
              │ Post-Processing │◀───────────┘         │ ClipboardManager│
              │    (Optional)   │                      │                 │
              │ ┌─────────────┐ │                      │  ┌───────────┐  │
              │ │   Ollama    │ │                      │  │  Paste/   │  │
              │ │   llama3.2  │ │─────────────────────▶│  │  Insert   │  │
              │ └─────────────┘ │                      │  └───────────┘  │
              └─────────────────┘                      └─────────────────┘
                                                                │
                                                                ▼
                                                       ┌──────────────┐
                                                       │ Target App   │
                                                       │ (at cursor)  │
                                                       └──────────────┘

Settings

Access via menu bar icon → Settings (⌘,)

General: Launch at login, sounds, floating indicator
Transcription: Model selection, language, post-processing
Shortcuts: Customize keyboard shortcuts
Advanced: Insert directly, trailing newline, silence detection

Privacy

All processing happens locally on your device
No data is sent to any cloud service
Audio is only saved temporarily during transcription
No telemetry or usage tracking

Project Structure

VoxScript/
├── Package.swift                    # Swift Package Manager dependencies
├── VoxScript.xcodeproj/
├── VoxScript/
│   ├── VoxScriptApp.swift           # Main app entry point
│   ├── Info.plist                   # App configuration
│   ├── VoxScript.entitlements       # Audio, automation entitlements
│   ├── Core/
│   │   ├── TranscriptionEngine.swift   # WhisperKit wrapper (singleton)
│   │   ├── AudioRecorder.swift         # AVAudioEngine recording
│   │   ├── HotkeyManager.swift         # KeyboardShortcuts wrapper
│   │   ├── ClipboardManager.swift      # Text insertion with terminal detection
│   │   └── PostProcessor.swift         # Ollama integration
│   ├── UI/
│   │   ├── FloatingPanel/              # Recording indicator
│   │   ├── Settings/                   # Settings tabs
│   │   ├── Onboarding/                 # First-run setup
│   │   └── MenuBar/                    # Status bar controller
│   ├── Models/
│   │   ├── AppState.swift              # Observable app state
│   │   ├── Settings.swift              # User preferences
│   │   └── TranscriptionResult.swift   # Result model
│   └── Utilities/
│       ├── Permissions.swift           # Permission helpers
│       └── SoundPlayer.swift           # Audio feedback
└── VoxScriptTests/                     # Unit tests

Troubleshooting

Text not inserting in Terminal/iTerm2

VoxScript automatically detects terminal apps and uses a different insertion method. If it's still not working:

Open Settings → Advanced
Disable "Insert directly"
Manually paste with ⌘V after transcription

Model download fails

Check your internet connection
Try a smaller model first (base or tiny)
Check available disk space

Shortcut not working

Ensure Accessibility permission is granted
Check System Settings → Privacy & Security → Accessibility
Toggle VoxScript off and on in the list

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

MIT License - see LICENSE for details.

Credits

WhisperKit by Argmax
Whisper by OpenAI
KeyboardShortcuts by Sindre Sorhus
Ollama for local LLM inference

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
VoxScript.xcodeproj		VoxScript.xcodeproj
VoxScript		VoxScript
VoxScriptTests		VoxScriptTests
docs		docs
scripts		scripts
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
IMPLEMENTATION_NOTES.md		IMPLEMENTATION_NOTES.md
LICENSE		LICENSE
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md
appcast.xml		appcast.xml
build-dmg.sh		build-dmg.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoxScript

Features

Architecture

Requirements

Installation

Download Release

Build from Source

Dependencies

Optional

Usage

First Run

Keyboard Shortcuts

Recording Modes

Available Models

Data Flow

Settings

Privacy

Project Structure

Troubleshooting

Text not inserting in Terminal/iTerm2

Model download fails

Shortcut not working

Contributing

License

Credits

See Also

About

Uh oh!

Releases 2

Packages

Contributors 2

Uh oh!

Languages

License

davidcv5/VoxScript

Folders and files

Latest commit

History

Repository files navigation

VoxScript

Features

Architecture

Requirements

Installation

Download Release

Build from Source

Dependencies

Optional

Usage

First Run

Keyboard Shortcuts

Recording Modes

Available Models

Data Flow

Settings

Privacy

Project Structure

Troubleshooting

Text not inserting in Terminal/iTerm2

Model download fails

Shortcut not working

Contributing

License

Credits

See Also

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Uh oh!

Languages

Packages