Skip to content

Comments

Feature/audio input#182

Merged
DenisovAV merged 10 commits intomainfrom
feature/audio-input
Feb 4, 2026
Merged

Feature/audio input#182
DenisovAV merged 10 commits intomainfrom
feature/audio-input

Conversation

@DenisovAV
Copy link
Owner

No description provided.

Audio support:
- Add supportAudio parameter through full chain (Dart -> Native)
- Add setAudioModelOptions() in Android native for MediaPipe
- Add audio recording UI in chat_input_field.dart
- Add audio playback in chat_message.dart
- Disable audio for .task models (no TF_LITE_AUDIO_ENCODER)

Download improvements:
- Add foreground parameter for Android large model downloads
- SmartDownloader auto-detects allowPause based on server response
- Remove automatic retries, keep manual retry only

Desktop fixes:
- Add maxNumImages parameter to grpc_client.initialize()
- Fix vision parameter passing chain

Tests:
- Add pigeon_support_audio_test.dart
- Add desktop_vision_params_test.dart
Audio Input:
- Add audio recording and conversion in chat_input_field
- Support audio bytes in gRPC client and server
- Add chatWithAudio method to desktop inference model
- Update proto with audio message support

Desktop Fixes:
- Switch to Azul Zulu JRE 24 (fixes Jinja template errors)
- Add SHA256 checksums for JRE verification
- Fix vision enable logic to match Android (maxNumImages > 0)
- Document vision limitation on macOS (SDK bug #684)
- Fix MediaPipe supportsAudio flag (audio is LiteRT-LM only)

Tests:
- Add desktop gRPC integration tests
- Add LiteRtLmSession unit tests
MediaPipe Engine:
- Add audio capability validation in createSession()
- Add consistent error handling in generateResponse()

Desktop:
- Add buffer cleanup in session close() to prevent memory leaks
- Add thread safety documentation for session class
- Add shutdown RPC before killing server process
- Fail fast on chatWithImage when vision not enabled

Server:
- Document WAV audio format expectation

Example:
- Fix audio error message (MediaPipe limitation, not iOS)

Documentation:
- Add Platform Limitations table with vision/audio support
- Document iOS Simulator, macOS vision issues
- Replace Temurin JRE 21 with Azul Zulu JRE 24 on all desktop platforms
  (Temurin causes Jinja template errors with LiteRT-LM native library)
- Update JAR version from 0.1.0 to 0.12.3
- Update all checksums for new JRE and JAR
- Update DESKTOP_SUPPORT.md with Vision/Audio feature columns
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds audio input support for Gemma 3n E2B/E4B models, enabling voice-to-text and multimodal interactions. It includes significant infrastructure upgrades (JRE switch from Temurin 21 to Azul Zulu 24) and comprehensive platform implementations across Android, Desktop, and Web.

Changes:

  • Audio input API with supportAudio parameter and addAudio() method for Android, Desktop (macOS/Windows/Linux), and Web platforms
  • JRE upgrade from Adoptium Temurin 21 to Azul Zulu 24 to fix Jinja template errors with LiteRT-LM native library
  • LiteRT-LM SDK update from 0.9.0-alpha01 to 0.9.0-alpha02 with Contents API support for multimodal messages
  • Enhanced download service with Android foreground service support for large files (>500MB)
  • Desktop bug fixes for text chat and callback-based streaming API

Reviewed changes

Copilot reviewed 77 out of 91 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pigeon.dart Added supportAudio and enableAudioModality parameters, addAudio method to PlatformService interface
lib/pigeon.g.dart Generated Dart code for new audio API methods
ios/Classes/PigeonInterface.g.swift Generated Swift code with audio support (iOS returns error - not supported)
android/src/main/kotlin/.../PigeonInterface.g.kt Generated Kotlin code for Android audio implementation
lib/core/message.dart Added audioBytes field and audio-related factory methods to Message class
lib/core/chat.dart Added supportAudio field to InferenceChat
android/src/main/kotlin/.../engines/ Audio support in MediaPipe and LiteRT-LM engines with proper error handling
litertlm-server/src/main/kotlin/ Desktop gRPC server audio implementation with WAV format support
lib/desktop/grpc_client.dart Added chatWithAudio method and audio parameters to initialization
lib/web/flutter_gemma_web.dart Web platform audio support with AudioPromptPart
windows/scripts/setup_desktop.ps1 JRE upgrade to Azul Zulu 24, version 0.12.3
macos/scripts/setup_desktop.sh JRE upgrade to Azul Zulu 24, version 0.12.3
linux/scripts/setup_desktop.sh JRE upgrade to Azul Zulu 24, version 0.12.3
lib/mobile/smart_downloader.dart Android foreground service configuration for large downloads
example/lib/utils/audio_converter.dart Audio format conversion utilities (PCM/WAV, resampling)
example/lib/chat_input_field.dart Audio recording UI with microphone button and waveform display
test/pigeon_support_audio_test.dart Integration tests for audio parameter passing through Pigeon
README.md Documentation updates for audio features and foreground downloads

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@DenisovAV DenisovAV merged commit f7430f0 into main Feb 4, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant