-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Summary
Upgrade and improve speech-to-text capabilities beyond the current WhisperKit implementation, potentially leveraging Apple's enhanced on-device speech recognition in iOS 18+.
Context
WhisperCode currently uses WhisperKit for voice input on iOS 18+, with a fallback to SFSpeechRecognizer (Apple's Speech framework) for older devices. While this works for basic voice input, there's potential for significant quality improvements.
Current Implementation
The app currently:
- Uses WhisperKit with the
openai_whisper-tiny.enmodel on iOS 18+ - Falls back to SFSpeechRecognizer for older iOS versions
- The fallback handles audio recording and resampling for Whisper compatibility
Motivation
- Better accuracy: Newer Apple speech recognition models may offer improved transcription quality
- On-device processing: iOS 18+ offers enhanced on-device speech recognition with better privacy
- Reduced latency: Apple's native speech framework may be faster than running Whisper locally
- Language support: Better multilingual support through native iOS speech recognition
- Battery efficiency: Potentially lower power consumption using Apple's optimized speech recognition
Considerations
-
iOS 26 Speech Recognition: Research what's new in Apple's speech recognition (SFSpeechRecognizer) for iOS 26 - there may be improved on-device models, better accuracy, or new APIs
-
Hybrid Approach: Consider using Apple Speech for most cases, with WhisperKit as a fallback or for specific use cases where it excels
-
Model Selection: If continuing with WhisperKit, evaluate larger/better models beyond "tiny"
-
Quality vs Speed Tradeoff: Balance transcription quality against latency for real-time voice input
-
Offline Support: Ensure robust functionality without network connectivity
Potential Improvements
- Upgrade to larger Whisper models (base, medium) if device permits
- Better handling of technical terminology (code-specific vocabulary)
- Improved punctuation and formatting
- Better handling of interruptions and corrections
- Multi-language support improvements
Scope
- Audit current speech-to-text quality and identify pain points
- Research iOS 26 speech recognition capabilities
- Benchmark Apple Speech vs WhisperKit accuracy
- Implement improvements with proper fallback handling
- Add settings for users to choose preferred engine
Labels
- enhancement
- ios-specific