-
Notifications
You must be signed in to change notification settings - Fork 0
V10 #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… cross-origin isolation requirement Skip Invalid WASM Paths: Only set custom WASM paths if they're explicitly configured and not the default /ort/ (which doesn't exist) Add Error Handling: Added try-catch blocks around ONNX initialization to provide cleaner error messages instead of cryptic WASM failures
…DN for ONNX Runtime
… cross-origin isolation
…ings for cross-origin isolation
…single-threaded execution for cross-origin isolation
…ilability and InferenceSession
….create availability
…ve error handling for module availability
…nt's flat config will exclude the local vendor files. Recreated .eslintignore (removed markdown fences) and ensured it contains: ort/ node_modules/ coverage/
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…cal model override in index.html (since the file doesn't exist), and disabled the custom WASM paths to ensure compatibility with the CDN-loaded ONNX Runtime Web
…the model URL being used and the exact error during session creation. This should help identify if the issue is with the model file, the runtime version, or the loading process
…changes in README
…tput frequency and update state tensor assignment
Summary of ChangesHello @cs-util, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly upgrades the application's audio transcription capabilities by integrating local ONNX Runtime WebAssembly for Voice Activity Detection (VAD) and intelligent audio chunking. It streamlines the process of preparing audio for transcription, moving beyond simple file uploads to a more robust, chunk-based approach that leverages VAD to identify speech segments. The changes also include necessary configuration updates for linting and local development, ensuring a smoother workflow for developers. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a comprehensive and robust client-side speech-to-text (STT) pipeline, including local ONNX Runtime (ORT) for voice activity detection (VAD), audio chunking, and transcription logic. The architecture is well-designed, with thoughtful fallbacks for potential failures in audio decoding or VAD. The addition of a script to fetch local ORT assets is also a great improvement for the development environment. I've identified a critical issue where the VAD module still loads the ONNX Runtime from a CDN instead of the newly added local files, which undermines a key goal of this PR. I've also included a few medium-severity suggestions to improve script robustness and code organization. Overall, this is a very impressive and significant feature addition.
| function ensureOrt() { | ||
| if (!ortPromise) { | ||
| // Try multiple import sources. Some CDNs or esm transforms wrap the | ||
| // real export under `default` or produce incomplete modules. Try | ||
| // esm.sh first (fast), then fall back to known CDN ESM builds. | ||
| ortPromise = (async () => { | ||
| const candidates = [ | ||
| 'https://esm.sh/onnxruntime-web@1.19.0', | ||
| 'https://cdn.jsdelivr.net/npm/onnxruntime-web@1.19.0/dist/ort.esm.js', | ||
| 'https://unpkg.com/onnxruntime-web@1.19.0/dist/ort.esm.js', | ||
| ]; | ||
|
|
||
| let lastError = null; | ||
| for (const url of candidates) { | ||
| try { | ||
| const module = await import(url); | ||
| const ort = module?.default || module; | ||
| if (!ort) throw new Error('empty module'); | ||
|
|
||
| // basic sanity: must expose InferenceSession.create | ||
| if ( | ||
| !ort.InferenceSession || | ||
| typeof ort.InferenceSession.create !== 'function' | ||
| ) { | ||
| throw new Error( | ||
| 'incomplete ort module (missing InferenceSession.create)' | ||
| ); | ||
| } | ||
|
|
||
| // configure WASM loader (single-threaded to avoid COOP/COEP) | ||
| if (ort?.env?.wasm) { | ||
| ort.env.wasm.numThreads = 1; | ||
| ort.env.wasm.wasmPaths = DEFAULT_ORT_WASM_PATH; | ||
| } | ||
|
|
||
| // diagnostic: report which URL produced a usable ort | ||
| try { | ||
| console.info('Loaded ONNX Runtime Web from', url); | ||
| console.log('ORT object keys:', Object.keys(ort)); | ||
| console.log('InferenceSession:', ort.InferenceSession); | ||
| console.log( | ||
| 'create function:', | ||
| typeof ort.InferenceSession?.create | ||
| ); | ||
| } catch { | ||
| /* ignore */ | ||
| } | ||
| return ort; | ||
| } catch (err) { | ||
| // try next candidate | ||
| lastError = err; | ||
| } | ||
| } | ||
|
|
||
| console.warn( | ||
| 'Failed to load ONNX Runtime Web from CDN candidates', | ||
| lastError | ||
| ); | ||
| throw new Error('ONNX Runtime Web not available'); | ||
| })(); | ||
| } | ||
| return ortPromise; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function currently loads the ONNX Runtime (ORT) module from public CDNs. A major goal of this pull request is to vendor these assets locally to improve reliability and avoid cross-origin issues, as evidenced by the new /ort directory and the fetch-onnx-dist.sh script. Loading from a CDN defeats this purpose and re-introduces an external dependency.
To align with the PR's goal, you should import the ORT module from the local /ort/ path. This will ensure the application uses the vendored files.
import ort from '/ort/ort.wasm.min.mjs';
/* istanbul ignore next -- runtime depends on onnxruntime-web in browser */
function ensureOrt() {
if (!ortPromise) {
ortPromise = (async () => {
if (!ort || typeof ort.InferenceSession?.create !== 'function') {
throw new Error('Failed to import onnxruntime-web from local path');
}
// configure WASM loader (single-threaded to avoid COOP/COEP)
if (ort?.env?.wasm) {
ort.env.wasm.numThreads = 1;
ort.env.wasm.wasmPaths = DEFAULT_ORT_WASM_PATH;
}
return ort;
})();
}
return ortPromise;
}| '../ort/**', | ||
| 'ort/**', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While not a functional issue, ignoring the ort/ directory is also handled by .eslintignore. For consistency and to have a single source of truth for top-level ignored directories, you might consider removing these lines and relying solely on .eslintignore. However, keeping it here also works, as ESLint will respect both.
| # Use npm pack to download the package tarball | ||
| echo "Downloading npm package..." | ||
| npm pack "onnxruntime-web@$VERSION" >/dev/null 2>&1 | ||
| TARBALL=$(ls onnxruntime-web-*.tgz | head -n1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using ls and head to get the tarball filename can be fragile. If there are other .tgz files in the temporary directory for any reason, this might pick the wrong one. A more robust approach is to capture the filename directly from the output of npm pack, which prints the filename of the created tarball on the last line.
| TARBALL=$(ls onnxruntime-web-*.tgz | head -n1) | |
| TARBALL=$(npm pack "onnxruntime-web@$VERSION" | tail -n1) |
| describe('merge helpers', () => { | ||
| test('buildPromptFromTail trims tail characters', () => { | ||
| const prompt = buildPromptFromTail(' Example transcript text '); | ||
| expect(prompt.endsWith('text')).toBe(true); | ||
| }); | ||
|
|
||
| test('mergeChunkResults removes duplicate sentences', () => { | ||
| const merged = mergeChunkResults([ | ||
| { index: 0, text: 'Hello world. This is chunk one.' }, | ||
| { index: 1, text: 'This is chunk one. And here is more.' }, | ||
| ]); | ||
| expect(merged).toContain('Hello world.'); | ||
| expect(merged).toContain('And here is more.'); | ||
| expect(merged).not.toContain('This is chunk one.\nThis is chunk one.'); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const probability = extractSpeechProbability(results); | ||
| probabilities.push(typeof probability === 'number' ? probability : 0); | ||
|
|
||
| stateTensor = results.stateN || stateTensor; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update VAD recurrent state from correct output key
detectSpeechSegments assigns stateTensor = results.stateN || stateTensor; to carry the Silero VAD RNN state forward. The exported model exposes the next state as state/state_out and does not include stateN, so this assignment always falls back to the original zeroed state. Each frame is therefore evaluated as if it were the first frame, breaking speech continuity and causing long utterances to be fragmented or missed. Read the returned state from the actual output key before feeding it back into the next iteration.
Useful? React with 👍 / 👎.
…for detectSpeechSegments
This pull request adds support for local ONNX Runtime (ORT) in development environments, including configuration for the ORT WebAssembly backend and related assets. It also updates ESLint and project ignore settings to accommodate the new
ortdirectory. The most important changes are grouped below:ORT Integration and Configuration:
ort-training-wasm-simd-threaded.mjsmodule for ONNX Runtime WebAssembly support, including logic for both browser and Node.js threaded environments. (ort/ort-training-wasm-simd-threaded.mjs)index.htmlto set default paths for ORT WebAssembly assets and the Silero VAD model, making local development easier by allowing overrides through global window properties. (index.html)Project and Linting Configuration:
.eslintignoreto ignore the newort/,node_modules/, andcoverage/directories, preventing linting and coverage tools from processing these files. (.eslintignore)ort/directory from linting, both at the root and in parent directories. (config/eslint.config.js)