Add LiteRT-LM engine support for Android (.litertlm models)#176
Add LiteRT-LM engine support for Android (.litertlm models)#176
Conversation
Implement Strategy pattern for inference engines with two backends: - MediaPipe (existing .task files) - LiteRT-LM (new .litertlm files with multimodal support) Key changes: - Add InferenceEngine interface with Engine/Session abstractions - Add EngineFactory for automatic engine selection based on file extension - Implement LiteRtLmEngine with visionBackend for multimodal models - Implement LiteRtLmSession with chunk buffering for MediaPipe compatibility - Add thread-safety (synchronized locks) in FlutterGemmaPlugin - Add LiteRT-LM SDK dependency (0.9.0-alpha01) - Add gemma3n LiteRT-LM model options in example app - Add unit tests for engines Tested with Gemma 3 Nano E2B multimodal (text + image) on Pixel 8.
There was a problem hiding this comment.
Pull request overview
This pull request adds support for LiteRT-LM models (.litertlm files) to the Flutter Gemma plugin by introducing a Strategy Pattern-based engine abstraction layer. The PR refactors the existing MediaPipe inference code into adapters and adds a new LiteRT-LM engine implementation alongside it.
Changes:
- Introduces InferenceEngine and InferenceSession abstractions with MediaPipe and LiteRT-LM implementations
- Adds EngineFactory for automatic engine selection based on model file extension
- Updates FlutterGemmaPlugin to use the new abstraction layer with improved thread safety
- Adds two new Gemma 3 Nano model variants (2B and 4B) using LiteRT-LM format in the example app
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/InferenceEngine.kt | Core engine abstraction interface defining initialization, session creation, and capabilities |
| android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/InferenceSession.kt | Session abstraction interface for text/image input and response generation |
| android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/EngineConfig.kt | Configuration data classes and SharedFlow factory for both engines |
| android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/EngineFactory.kt | Factory for automatic engine selection based on file extension |
| android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/mediapipe/MediaPipeEngine.kt | Adapter wrapping existing MediaPipe LlmInference implementation |
| android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/mediapipe/MediaPipeSession.kt | Adapter wrapping existing MediaPipe session implementation |
| android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/litertlm/LiteRtLmEngine.kt | New LiteRT-LM engine implementation with caching support |
| android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/litertlm/LiteRtLmSession.kt | New LiteRT-LM session with chunk buffering for MediaPipe compatibility |
| android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt | Updated to use engine abstraction with enhanced synchronization and cleanup |
| android/src/test/kotlin/dev/flutterberlin/flutter_gemma/engines/EngineFactoryTest.kt | Comprehensive tests for factory engine selection logic |
| android/src/test/kotlin/dev/flutterberlin/flutter_gemma/engines/litertlm/LiteRtLmEngineTest.kt | Unit tests for LiteRT-LM engine capabilities and lifecycle |
| android/src/test/kotlin/dev/flutterberlin/flutter_gemma/engines/litertlm/LiteRtLmSessionTest.kt | Unit tests for LiteRT-LM session including thread safety and token estimation |
| example/lib/models/model.dart | Adds Gemma 3 Nano 2B and 4B LiteRT-LM model variants, fixes local model filename |
| android/build.gradle | Adds LiteRT-LM SDK dependency (v0.9.0-alpha01) |
Comments suppressed due to low confidence (3)
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:142
- Resource leak on initialization failure: If newEngine.initialize() at line 129 throws an exception, the newEngine instance created at line 128 is not closed. This could leak resources if the engine constructor allocated any resources before initialization failed. Consider wrapping the initialize call in a try-catch that closes the engine on failure before rethrowing.
// Create and initialize new engine BEFORE clearing old state
// This ensures we don't leave state inconsistent on failure
val newEngine = EngineFactory.createFromModelPath(modelPath, context)
newEngine.initialize(config)
// Only now clear old state and swap in new engine (thread-safe)
synchronized(engineLock) {
session?.close()
session = null
engine?.close()
engine = newEngine
}
callback(Result.success(Unit))
} catch (e: Exception) {
callback(Result.failure(e))
}
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:216
- Race condition: Session access is not properly synchronized. Lines 209-211 read the
sessionfield outside of synchronization, but the session can be nullified bycloseSession()(line 198) orcreateSession()(lines 184-185) concurrently. This could cause null pointer exceptions or use-after-close errors.
The same issue exists in addQueryChunk (222-224), addImage (235-237), generateResponse (248-250), generateResponseAsync (261-263), and stopGeneration (274-276).
Solution: Wrap the session access in synchronized(engineLock) to ensure consistent access across all methods that read or write to the session field.
override fun sizeInTokens(prompt: String, callback: (Result<Long>) -> Unit) {
scope.launch {
try {
val currentSession = session
?: throw IllegalStateException("Session not created")
val size = currentSession.sizeInTokens(prompt)
callback(Result.success(size.toLong()))
} catch (e: Exception) {
callback(Result.failure(e))
}
}
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:313
- Missing synchronization on engine access: The engine field is accessed inside synchronized(engineLock) at line 290, but the streamJob creation at line 292 happens inside that scope while accessing engine flows (lines 294, 306). If the engine is closed or replaced between checking it at line 290 and accessing its flows, this could result in flows from a closed/different engine being collected.
Additionally, streamJob is modified at line 292 without synchronization, but is also accessed in onCancel() at line 317 without synchronization, which could cause race conditions.
override fun onListen(arguments: Any?, events: EventChannel.EventSink?) {
// Cancel previous stream collection to prevent orphaned coroutines
streamJob?.cancel()
eventSink = events
synchronized(engineLock) {
val currentEngine = engine ?: return
streamJob = scope.launch {
launch {
currentEngine.partialResults.collect { (text, done) ->
val payload = mapOf("partialResult" to text, "done" to done)
withContext(Dispatchers.Main) {
events?.success(payload)
if (done) {
events?.endOfStream()
}
}
}
}
launch {
currentEngine.errors.collect { error ->
withContext(Dispatchers.Main) {
events?.error("ERROR", error.message, null)
}
}
}
}
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/mediapipe/MediaPipeEngine.kt
Show resolved
Hide resolved
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/litertlm/LiteRtLmEngine.kt
Outdated
Show resolved
Hide resolved
android/src/test/kotlin/dev/flutterberlin/flutter_gemma/engines/litertlm/LiteRtLmSessionTest.kt
Outdated
Show resolved
Hide resolved
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/litertlm/LiteRtLmEngine.kt
Show resolved
Hide resolved
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/EngineFactory.kt
Outdated
Show resolved
Hide resolved
- Remove non-existent SDK values: unknown, gpuFloat16, gpuMixed, gpuFull, tpu - Add NPU backend support for LiteRT-LM (Google Tensor, Qualcomm) - Simplify backend mapping across all engines - Use Pigeon-generated PreferredBackend directly instead of PreferredBackendEnum - Update tests for NPU backend - Fix Copilot review issues: typo in test comment, error message for missing extension
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (5)
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:213
- Race condition: The session variable is accessed without synchronization. After reading
val currentSession = sessionon line 206, another thread could callcloseSession()(line 191-200) and set session to null, causing the subsequent call tocurrentSession.sizeInTokens(prompt)to operate on a session that has been closed. The same issue exists in addQueryChunk, addImage, generateResponse, generateResponseAsync, and stopGeneration methods.
The session variable should either be marked as @volatile and accessed within synchronized blocks, or the entire method body should be wrapped in synchronized(engineLock) { ... }. Compare with createSession (lines 168-183) and closeSession (lines 192-199) which properly use synchronized(engineLock).
override fun sizeInTokens(prompt: String, callback: (Result<Long>) -> Unit) {
scope.launch {
try {
val currentSession = session
?: throw IllegalStateException("Session not created")
val size = currentSession.sizeInTokens(prompt)
callback(Result.success(size.toLong()))
} catch (e: Exception) {
callback(Result.failure(e))
}
}
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:227
- Race condition: Same session synchronization issue as in sizeInTokens. The session variable is accessed without proper synchronization, allowing another thread to close the session between reading the reference and using it.
override fun addQueryChunk(prompt: String, callback: (Result<Unit>) -> Unit) {
scope.launch {
try {
val currentSession = session
?: throw IllegalStateException("Session not created")
currentSession.addQueryChunk(prompt)
callback(Result.success(Unit))
} catch (e: Exception) {
callback(Result.failure(e))
}
}
}
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:240
- Race condition: Same session synchronization issue. The session variable is accessed without proper synchronization, allowing another thread to close the session between reading the reference and using it.
override fun addImage(imageBytes: ByteArray, callback: (Result<Unit>) -> Unit) {
scope.launch {
try {
val currentSession = session
?: throw IllegalStateException("Session not created")
currentSession.addImage(imageBytes)
callback(Result.success(Unit))
} catch (e: Exception) {
callback(Result.failure(e))
}
}
}
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:253
- Race condition: Same session synchronization issue. The session variable is accessed without proper synchronization, allowing another thread to close the session between reading the reference and using it.
override fun generateResponse(callback: (Result<String>) -> Unit) {
scope.launch {
try {
val currentSession = session
?: throw IllegalStateException("Session not created")
val result = currentSession.generateResponse()
callback(Result.success(result))
} catch (e: Exception) {
callback(Result.failure(e))
}
}
}
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:265
- Race condition: Same session synchronization issue. The session variable is accessed without proper synchronization, allowing another thread to close the session between reading the reference and using it.
override fun generateResponseAsync(callback: (Result<Unit>) -> Unit) {
scope.launch {
try {
val currentSession = session
?: throw IllegalStateException("Session not created")
currentSession.generateResponseAsync()
callback(Result.success(Unit))
} catch (e: Exception) {
callback(Result.failure(e))
}
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
android/src/test/kotlin/dev/flutterberlin/flutter_gemma/engines/litertlm/LiteRtLmSessionTest.kt
Outdated
Show resolved
Hide resolved
- Document backend support per platform (Android, iOS, Web, Desktop) - Clarify that CPU is not supported on Web (MediaPipe limitation) - Clarify that NPU is Android-only (.litertlm models) - Add docstrings to PreferredBackend enum in pigeon.dart - Update proto comments for desktop backend options
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 23 changed files in this pull request and generated 7 comments.
Comments suppressed due to low confidence (1)
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:310
- Race condition: The engine reference is captured inside a synchronized block but then used outside of it when launching the coroutine. If another thread calls createModel or closeModel between capturing the reference and the coroutine starting to collect from it, the engine could be closed while the coroutine is still collecting from its flows.
Consider capturing the flows (partialResults and errors) inside the synchronized block and collecting from those captured references, or keep the engine reference alive with proper lifecycle management.
synchronized(engineLock) {
val currentEngine = engine ?: return
streamJob = scope.launch {
launch {
currentEngine.partialResults.collect { (text, done) ->
val payload = mapOf("partialResult" to text, "done" to done)
withContext(Dispatchers.Main) {
events?.success(payload)
if (done) {
events?.endOfStream()
}
}
}
}
launch {
currentEngine.errors.collect { error ->
withContext(Dispatchers.Main) {
events?.error("ERROR", error.message, null)
}
}
}
}
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt
Show resolved
Hide resolved
| val currentSession = session | ||
| ?: throw IllegalStateException("Session not created") | ||
| val result = currentSession.generateResponse() |
There was a problem hiding this comment.
Race condition: Session is accessed without synchronization. If another thread calls closeSession, closeModel, or createSession while this method is executing, the captured session reference could point to a closed session, leading to use-after-close errors.
Consider using synchronized(engineLock) when capturing the session reference to ensure thread-safe access.
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt
Show resolved
Hide resolved
| val currentSession = session | ||
| ?: throw IllegalStateException("Session not created") | ||
| currentSession.cancelGeneration() |
There was a problem hiding this comment.
Race condition: Session is accessed without synchronization. If another thread calls closeSession, closeModel, or createSession while this method is executing, the captured session reference could point to a closed session, leading to use-after-close errors.
Consider using synchronized(engineLock) when capturing the session reference to ensure thread-safe access.
CLAUDE.md
Outdated
| ├── EngineConfig.kt # Configuration data classes | ||
| ├── EngineFactory.kt # Factory for engine creation | ||
| ├── FlowFactory.kt # SharedFlow factory |
There was a problem hiding this comment.
Documentation inaccuracy: FlowFactory is documented as a separate file (FlowFactory.kt) but it's actually part of EngineConfig.kt as an object. Update the architecture diagram to reflect that FlowFactory is within EngineConfig.kt, not a separate file.
| ├── EngineConfig.kt # Configuration data classes | |
| ├── EngineFactory.kt # Factory for engine creation | |
| ├── FlowFactory.kt # SharedFlow factory | |
| ├── EngineConfig.kt # Configuration data classes and FlowFactory object (SharedFlow factory) | |
| ├── EngineFactory.kt # Factory for engine creation |
Summary
InferenceEngine/InferenceSession).task(MediaPipe) and.litertlm(LiteRT-LM) model formatsArchitecture
EngineFactory.createFromModelPath()automatically selects the correct engine based on file extension.PreferredBackend Changes
NPU Support:
Key Files
engines/InferenceEngine.kt- Engine abstractionengines/InferenceSession.kt- Session abstractionengines/EngineFactory.kt- Factory with auto-detectionengines/litertlm/LiteRtLmEngine.kt- LiteRT-LM implementationengines/litertlm/LiteRtLmSession.kt- LiteRT-LM session with chunk bufferingengines/mediapipe/MediaPipeEngine.kt- MediaPipe wrapperpigeon.dart- PreferredBackend enum definitionKnown Issues (for future PRs)