Skip to content

Upgrade speech-to-text with enhanced Apple Speech recognition #7

@BillSteinUNB

Description

@BillSteinUNB

Summary

Upgrade and improve speech-to-text capabilities beyond the current WhisperKit implementation, potentially leveraging Apple's enhanced on-device speech recognition in iOS 18+.

Context

WhisperCode currently uses WhisperKit for voice input on iOS 18+, with a fallback to SFSpeechRecognizer (Apple's Speech framework) for older devices. While this works for basic voice input, there's potential for significant quality improvements.

Current Implementation

The app currently:

  • Uses WhisperKit with the openai_whisper-tiny.en model on iOS 18+
  • Falls back to SFSpeechRecognizer for older iOS versions
  • The fallback handles audio recording and resampling for Whisper compatibility

Motivation

  • Better accuracy: Newer Apple speech recognition models may offer improved transcription quality
  • On-device processing: iOS 18+ offers enhanced on-device speech recognition with better privacy
  • Reduced latency: Apple's native speech framework may be faster than running Whisper locally
  • Language support: Better multilingual support through native iOS speech recognition
  • Battery efficiency: Potentially lower power consumption using Apple's optimized speech recognition

Considerations

  1. iOS 26 Speech Recognition: Research what's new in Apple's speech recognition (SFSpeechRecognizer) for iOS 26 - there may be improved on-device models, better accuracy, or new APIs

  2. Hybrid Approach: Consider using Apple Speech for most cases, with WhisperKit as a fallback or for specific use cases where it excels

  3. Model Selection: If continuing with WhisperKit, evaluate larger/better models beyond "tiny"

  4. Quality vs Speed Tradeoff: Balance transcription quality against latency for real-time voice input

  5. Offline Support: Ensure robust functionality without network connectivity

Potential Improvements

  • Upgrade to larger Whisper models (base, medium) if device permits
  • Better handling of technical terminology (code-specific vocabulary)
  • Improved punctuation and formatting
  • Better handling of interruptions and corrections
  • Multi-language support improvements

Scope

  • Audit current speech-to-text quality and identify pain points
  • Research iOS 26 speech recognition capabilities
  • Benchmark Apple Speech vs WhisperKit accuracy
  • Implement improvements with proper fallback handling
  • Add settings for users to choose preferred engine

Labels

  • enhancement
  • ios-specific

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions