Local AI-powered dictation for macOS using WhisperKit
┌─────────────────────────────────────────────────────────────────┐
│ │
│ 🎤 VoxScript - Local Dictation for macOS │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────┐ │
│ │ Record │ -> │ WhisperKit │ -> │ Insert at │ │
│ │ Audio │ │ Transcribe │ │ Cursor │ │
│ └─────────────┘ └──────────────┘ └─────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Ollama │ │
│ │ (Optional) │ │
│ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
- 100% Local Processing - All transcription happens on-device using Apple Silicon
- Global Hotkeys - Press ⌘⇧Space anywhere to start/stop recording
- Multiple Recording Modes - Toggle, Push-to-Talk, or Continuous with silence detection
- Optional Post-Processing - Clean up text with local Ollama LLM
- Menu Bar App - Runs quietly in the background
- Works Everywhere - Text insertion works in standard apps AND terminal emulators
┌─────────────────────────────────────────────────────────────┐
│ VoxScript.app │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ VoxScriptApp │ │
│ │ (App Entry Point) │ │
│ └──────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ │
│ │ StatusBar │ │ FloatingPanel│ │ HotkeyManager │ │
│ │ Controller │ │ Controller │ │ │ │
│ └─────────────┘ └──────────────┘ └─────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Core Services │ │
│ ├──────────────┬───────────────┬──────────────────────┤ │
│ │AudioRecorder │TranscriptionE │ PostProcessor │ │
│ │ (AVAudioEng) │ (WhisperKit) │ (Ollama) │ │
│ └──────────────┴───────────────┴──────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Models │ │
│ ├─────────────┬─────────────┬─────────────────────────┤ │
│ │ AppState │ Settings │ TranscriptionResult │ │
│ └─────────────┴─────────────┴─────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
- macOS 14.0+ (Sonoma or later)
- Apple Silicon (M1/M2/M3/M4)
- ~1-2GB disk space for Whisper model
Download the latest DMG from the Releases page.
# Clone the repository
git clone https://github.com/davidcv5/VoxScript.git
cd VoxScript
# Open in Xcode
open VoxScript.xcodeproj
# Build and Run (⌘R)Or build via command line:
xcodebuild -project VoxScript.xcodeproj -scheme VoxScript -configuration Release| Package | Version | Purpose |
|---|---|---|
| WhisperKit | 0.15.0+ | Speech-to-text engine |
| KeyboardShortcuts | 2.0.0+ | Global hotkey handling |
- Ollama - For post-processing text cleanup (install separately)
- Launch VoxScript - It appears in the menu bar
- Press ⌘⇧Space to start recording
- Speak your text
- Press ⌘⇧Space again to stop and transcribe
- Text is automatically inserted at cursor
On first launch, VoxScript will:
- Request Microphone permission
- Request Accessibility permission (for global shortcuts)
- Download the default Whisper model (~1GB)
| Action | Shortcut |
|---|---|
| Toggle Recording | ⌘⇧Space |
| Cancel Recording | Escape |
| Open Settings | ⌘, |
| Mode | Behavior |
|---|---|
| Toggle | Press to start, press again to stop |
| Push-to-Talk | Hold key to record, release to transcribe |
| Continuous | Auto-stops after detecting silence (2s) |
| Model | Size | Speed | Accuracy |
|---|---|---|---|
| large-v3-turbo | ~950MB | Fast | Excellent |
| large-v3 | ~1.5GB | Slower | Best |
| small.en | ~460MB | Very fast | Good (English) |
| base | ~140MB | Fastest | Basic |
| tiny | ~75MB | Instant | Testing only |
┌──────────┐ ┌───────────────┐ ┌──────────────┐
│ Mic/User │────▶│ AudioRecorder │────▶│ Temp WAV File│
└──────────┘ └───────────────┘ └──────────────┘
│
▼
┌──────────────────────────────────────┐
│ TranscriptionEngine │
│ ┌──────────────────────────────┐ │
│ │ WhisperKit │ │
│ │ ┌────────┐ ┌───────────┐ │ │
│ │ │ Model │ + │ CoreML/ANE│ │ │
│ │ └────────┘ └───────────┘ │ │
│ └──────────────────────────────┘ │
└──────────────────────────────────────┘
│
▼
┌──────────────────────┐
│ TranscriptionResult │
│ { text, language } │
└──────────────────────┘
│
┌────────────────────┼────────────────────┐
│ │ │
▼ │ ▼
┌─────────────────┐ │ ┌─────────────────┐
│ Post-Processing │◀───────────┘ │ ClipboardManager│
│ (Optional) │ │ │
│ ┌─────────────┐ │ │ ┌───────────┐ │
│ │ Ollama │ │ │ │ Paste/ │ │
│ │ llama3.2 │ │─────────────────────▶│ │ Insert │ │
│ └─────────────┘ │ │ └───────────┘ │
└─────────────────┘ └─────────────────┘
│
▼
┌──────────────┐
│ Target App │
│ (at cursor) │
└──────────────┘
Access via menu bar icon → Settings (⌘,)
- General: Launch at login, sounds, floating indicator
- Transcription: Model selection, language, post-processing
- Shortcuts: Customize keyboard shortcuts
- Advanced: Insert directly, trailing newline, silence detection
- All processing happens locally on your device
- No data is sent to any cloud service
- Audio is only saved temporarily during transcription
- No telemetry or usage tracking
VoxScript/
├── Package.swift # Swift Package Manager dependencies
├── VoxScript.xcodeproj/
├── VoxScript/
│ ├── VoxScriptApp.swift # Main app entry point
│ ├── Info.plist # App configuration
│ ├── VoxScript.entitlements # Audio, automation entitlements
│ ├── Core/
│ │ ├── TranscriptionEngine.swift # WhisperKit wrapper (singleton)
│ │ ├── AudioRecorder.swift # AVAudioEngine recording
│ │ ├── HotkeyManager.swift # KeyboardShortcuts wrapper
│ │ ├── ClipboardManager.swift # Text insertion with terminal detection
│ │ └── PostProcessor.swift # Ollama integration
│ ├── UI/
│ │ ├── FloatingPanel/ # Recording indicator
│ │ ├── Settings/ # Settings tabs
│ │ ├── Onboarding/ # First-run setup
│ │ └── MenuBar/ # Status bar controller
│ ├── Models/
│ │ ├── AppState.swift # Observable app state
│ │ ├── Settings.swift # User preferences
│ │ └── TranscriptionResult.swift # Result model
│ └── Utilities/
│ ├── Permissions.swift # Permission helpers
│ └── SoundPlayer.swift # Audio feedback
└── VoxScriptTests/ # Unit tests
VoxScript automatically detects terminal apps and uses a different insertion method. If it's still not working:
- Open Settings → Advanced
- Disable "Insert directly"
- Manually paste with ⌘V after transcription
- Check your internet connection
- Try a smaller model first (base or tiny)
- Check available disk space
- Ensure Accessibility permission is granted
- Check System Settings → Privacy & Security → Accessibility
- Toggle VoxScript off and on in the list
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT License - see LICENSE for details.
- WhisperKit by Argmax
- Whisper by OpenAI
- KeyboardShortcuts by Sindre Sorhus
- Ollama for local LLM inference
- VoxScript PRD - Full product requirements document with implementation notes