V10 #13

cs-util · 2025-10-29T10:46:02Z

This pull request adds support for local ONNX Runtime (ORT) in development environments, including configuration for the ORT WebAssembly backend and related assets. It also updates ESLint and project ignore settings to accommodate the new ort directory. The most important changes are grouped below:

ORT Integration and Configuration:

Added a local copy of the ort-training-wasm-simd-threaded.mjs module for ONNX Runtime WebAssembly support, including logic for both browser and Node.js threaded environments. (ort/ort-training-wasm-simd-threaded.mjs)
Updated index.html to set default paths for ORT WebAssembly assets and the Silero VAD model, making local development easier by allowing overrides through global window properties. (index.html)

Project and Linting Configuration:

Updated .eslintignore to ignore the new ort/, node_modules/, and coverage/ directories, preventing linting and coverage tools from processing these files. (.eslintignore)
Updated ESLint configuration to exclude the ort/ directory from linting, both at the root and in parent directories. (config/eslint.config.js)

… cross-origin isolation requirement Skip Invalid WASM Paths: Only set custom WASM paths if they're explicitly configured and not the default /ort/ (which doesn't exist) Add Error Handling: Added try-catch blocks around ONNX initialization to provide cleaner error messages instead of cryptic WASM failures

…DN for ONNX Runtime

… cross-origin isolation

…ings for cross-origin isolation

…single-threaded execution for cross-origin isolation

…ilability and InferenceSession

….create availability

…ve error handling for module availability

…nt's flat config will exclude the local vendor files. Recreated .eslintignore (removed markdown fences) and ensured it contains: ort/ node_modules/ coverage/

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…cal model override in index.html (since the file doesn't exist), and disabled the custom WASM paths to ensure compatibility with the CDN-loaded ONNX Runtime Web

…ession logging

…the model URL being used and the exact error during session creation. This should help identify if the issue is with the model file, the runtime version, or the loading process

…changes in README

…lity

…tput frequency and update state tensor assignment

…tency

gemini-code-assist · 2025-10-29T10:46:59Z

Summary of Changes

Hello @cs-util, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the application's audio transcription capabilities by integrating local ONNX Runtime WebAssembly for Voice Activity Detection (VAD) and intelligent audio chunking. It streamlines the process of preparing audio for transcription, moving beyond simple file uploads to a more robust, chunk-based approach that leverages VAD to identify speech segments. The changes also include necessary configuration updates for linting and local development, ensuring a smoother workflow for developers.

Highlights

ONNX Runtime Web (ORT) Integration: Added local copies of various ONNX Runtime WebAssembly modules and related assets to the ort/ directory, enabling local ORT support in development environments.
Development Environment Configuration: Updated index.html to set default paths for ORT WebAssembly assets and the Silero VAD model, simplifying local development by allowing overrides via global window properties.
Linting and Project Ignore Settings: Modified .eslintignore and config/eslint.config.js to exclude the new ort/ directory, node_modules/, and coverage/ from linting and coverage processing.
Advanced Audio Transcription Logic: Introduced new modules (src/stt/audio.js, src/stt/chunking.js, src/stt/config.js, src/stt/merge.js, src/stt/transcriber.js, src/stt/vad.js) to implement sophisticated audio processing, Voice Activity Detection (VAD), and intelligent chunking for improved transcription.
OpenAI Transcription API Enhancement: The transcribeFile function in src/openai.js now accepts an optional prompt parameter, allowing for more contextual transcription requests.
Automated ONNX Runtime Asset Management: Added a new shell script (scripts/fetch-onnx-dist.sh) and its documentation (scripts/README.md) to automate the downloading and placement of onnxruntime-web distribution assets.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a comprehensive and robust client-side speech-to-text (STT) pipeline, including local ONNX Runtime (ORT) for voice activity detection (VAD), audio chunking, and transcription logic. The architecture is well-designed, with thoughtful fallbacks for potential failures in audio decoding or VAD. The addition of a script to fetch local ORT assets is also a great improvement for the development environment. I've identified a critical issue where the VAD module still loads the ONNX Runtime from a CDN instead of the newly added local files, which undermines a key goal of this PR. I've also included a few medium-severity suggestions to improve script robustness and code organization. Overall, this is a very impressive and significant feature addition.

gemini-code-assist · 2025-10-29T10:48:38Z

src/stt/vad.js

+function ensureOrt() {
+  if (!ortPromise) {
+    // Try multiple import sources. Some CDNs or esm transforms wrap the
+    // real export under `default` or produce incomplete modules. Try
+    // esm.sh first (fast), then fall back to known CDN ESM builds.
+    ortPromise = (async () => {
+      const candidates = [
+        'https://esm.sh/onnxruntime-web@1.19.0',
+        'https://cdn.jsdelivr.net/npm/onnxruntime-web@1.19.0/dist/ort.esm.js',
+        'https://unpkg.com/onnxruntime-web@1.19.0/dist/ort.esm.js',
+      ];
+
+      let lastError = null;
+      for (const url of candidates) {
+        try {
+          const module = await import(url);
+          const ort = module?.default || module;
+          if (!ort) throw new Error('empty module');
+
+          // basic sanity: must expose InferenceSession.create
+          if (
+            !ort.InferenceSession ||
+            typeof ort.InferenceSession.create !== 'function'
+          ) {
+            throw new Error(
+              'incomplete ort module (missing InferenceSession.create)'
+            );
+          }
+
+          // configure WASM loader (single-threaded to avoid COOP/COEP)
+          if (ort?.env?.wasm) {
+            ort.env.wasm.numThreads = 1;
+            ort.env.wasm.wasmPaths = DEFAULT_ORT_WASM_PATH;
+          }
+
+          // diagnostic: report which URL produced a usable ort
+          try {
+            console.info('Loaded ONNX Runtime Web from', url);
+            console.log('ORT object keys:', Object.keys(ort));
+            console.log('InferenceSession:', ort.InferenceSession);
+            console.log(
+              'create function:',
+              typeof ort.InferenceSession?.create
+            );
+          } catch {
+            /* ignore */
+          }
+          return ort;
+        } catch (err) {
+          // try next candidate
+          lastError = err;
+        }
+      }
+
+      console.warn(
+        'Failed to load ONNX Runtime Web from CDN candidates',
+        lastError
+      );
+      throw new Error('ONNX Runtime Web not available');
+    })();
+  }
+  return ortPromise;
+}


This function currently loads the ONNX Runtime (ORT) module from public CDNs. A major goal of this pull request is to vendor these assets locally to improve reliability and avoid cross-origin issues, as evidenced by the new /ort directory and the fetch-onnx-dist.sh script. Loading from a CDN defeats this purpose and re-introduces an external dependency.

To align with the PR's goal, you should import the ORT module from the local /ort/ path. This will ensure the application uses the vendored files.

import ort from '/ort/ort.wasm.min.mjs'; /* istanbul ignore next -- runtime depends on onnxruntime-web in browser */ function ensureOrt() { if (!ortPromise) { ortPromise = (async () => { if (!ort || typeof ort.InferenceSession?.create !== 'function') { throw new Error('Failed to import onnxruntime-web from local path'); } // configure WASM loader (single-threaded to avoid COOP/COEP) if (ort?.env?.wasm) { ort.env.wasm.numThreads = 1; ort.env.wasm.wasmPaths = DEFAULT_ORT_WASM_PATH; } return ort; })(); } return ortPromise; }

gemini-code-assist · 2025-10-29T10:48:38Z

config/eslint.config.js

+      '../ort/**',
+      'ort/**',


While not a functional issue, ignoring the ort/ directory is also handled by .eslintignore. For consistency and to have a single source of truth for top-level ignored directories, you might consider removing these lines and relying solely on .eslintignore. However, keeping it here also works, as ESLint will respect both.

gemini-code-assist · 2025-10-29T10:48:39Z

scripts/fetch-onnx-dist.sh

+# Use npm pack to download the package tarball
+echo "Downloading npm package..."
+npm pack "onnxruntime-web@$VERSION" >/dev/null 2>&1
+TARBALL=$(ls onnxruntime-web-*.tgz | head -n1)


Using ls and head to get the tarball filename can be fragile. If there are other .tgz files in the temporary directory for any reason, this might pick the wrong one. A more robust approach is to capture the filename directly from the output of npm pack, which prints the filename of the created tarball on the last line.

Suggested change

TARBALL=$(ls onnxruntime-web-*.tgz | head -n1)

TARBALL=$(npm pack "onnxruntime-web@$VERSION" | tail -n1)

gemini-code-assist · 2025-10-29T10:48:39Z

src/stt/chunking.test.js

+describe('merge helpers', () => {
+  test('buildPromptFromTail trims tail characters', () => {
+    const prompt = buildPromptFromTail('  Example transcript text  ');
+    expect(prompt.endsWith('text')).toBe(true);
+  });
+
+  test('mergeChunkResults removes duplicate sentences', () => {
+    const merged = mergeChunkResults([
+      { index: 0, text: 'Hello world. This is chunk one.' },
+      { index: 1, text: 'This is chunk one. And here is more.' },
+    ]);
+    expect(merged).toContain('Hello world.');
+    expect(merged).toContain('And here is more.');
+    expect(merged).not.toContain('This is chunk one.\nThis is chunk one.');
+  });
+});


These tests for merge.js helpers are currently located in the test file for chunking.js. For better organization and maintainability, it would be ideal to move these tests to a dedicated src/stt/merge.test.js file.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-10-29T10:50:22Z

src/stt/vad.js

+    const probability = extractSpeechProbability(results);
+    probabilities.push(typeof probability === 'number' ? probability : 0);
+
+    stateTensor = results.stateN || stateTensor;
+  }


Update VAD recurrent state from correct output key

detectSpeechSegments assigns stateTensor = results.stateN || stateTensor; to carry the Silero VAD RNN state forward. The exported model exposes the next state as state/state_out and does not include stateN, so this assignment always falls back to the original zeroed state. Each frame is therefore evaluated as if it were the first frame, breaking speech continuity and causing long utterances to be fragmented or missed. Read the returned state from the actual output key before feeding it back into the next iteration.

Useful? React with 👍 / 👎.

…for detectSpeechSegments

cs-util and others added 29 commits October 28, 2025 17:06

Implement client-side chunked transcription pipeline

2973ce7

Refactor chunking tests for clarity and accuracy; update VAD to use C…

df5b6e4

…DN for ONNX Runtime

Update ONNX Runtime import to use CDN and disable multi-threading for…

366ee0b

… cross-origin isolation

Update ONNX Runtime import to use ESM and adjust multi-threading sett…

f0678bb

…ings for cross-origin isolation

Refactor ONNX Runtime import to simplify promise handling and ensure …

21a646c

…single-threaded execution for cross-origin isolation

Enhance error handling in ONNX Runtime import to check for module ava…

c129850

…ilability and InferenceSession

Enhance error handling in ensureSession to check for InferenceSession…

cfd4646

….create availability

Enhance ONNX Runtime import to support multiple CDN sources and impro…

545df4a

…ve error handling for module availability

Add diagnostic logging for ONNX Runtime Web URL loading

2eecbad

Add fetch script for ONNX Runtime Web distribution assets

a5d0bff

Added local ort vendor directory

83a8397

Added explicit ignore patterns for ort/** in eslint.config.js so ESLi…

e7f432a

…nt's flat config will exclude the local vendor files. Recreated .eslintignore (removed markdown fences) and ensured it contains: ort/ node_modules/ coverage/

Add local ORT and Silero VAD model paths for development

18c9f12

Update src/stt/vad.js

979831a

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update scripts/fetch-onnx-dist.sh

e8658ab

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update scripts/fetch-onnx-dist.sh

b2de2cd

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Merge commit 'b2de2cd577b316d1780fd015f9952886de730b50' into v8

38984d0

updated the default model URL to the v4 version, commented out the lo…

a8ce668

…cal model override in index.html (since the file doesn't exist), and disabled the custom WASM paths to ensure compatibility with the CDN-loaded ONNX Runtime Web

Add Silero VAD model path and update README instructions for local setup

31c1c01

Add logging for ONNX Runtime object keys and InferenceSession details

9801fb1

Update default ONNX Runtime version in fetch script and enhance VAD s…

b429f97

…ession logging

Updated ONNX Runtime Web to version 1.19.0 and added logging to show …

39c5826

…the model URL being used and the exact error during session creation. This should help identify if the issue is with the model file, the runtime version, or the loading process

Update Silero VAD model path to '/model/silero_vad.onnx' and reflect …

afa5551

…changes in README

Fix tensor shape in createStateTensor function for correct dimensiona…

905b5d9

…lity

Fix tensor shape in createStateTensor function for correct dimensiona…

4a8dfa7

…lity

Enhance logging in detectSpeechSegments function to reduce console ou…

39b0b0f

…tput frequency and update state tensor assignment

Refactor logging in VAD functions for improved readability and consis…

3130785

…tency

unused createHiddenTensor function

8a7f8a6

gemini-code-assist bot reviewed Oct 29, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Oct 29, 2025

View reviewed changes

Add testing helpers for injecting fake ort/session and enhance tests …

9ceb601

…for detectSpeechSegments

cs-util merged commit 9ceb601 into main Oct 31, 2025
6 checks passed

cs-util deleted the v10 branch October 31, 2025 03:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

V10 #13

V10 #13

Uh oh!

cs-util commented Oct 29, 2025

Uh oh!

gemini-code-assist bot commented Oct 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 29, 2025

Uh oh!

gemini-code-assist bot Oct 29, 2025

Uh oh!

gemini-code-assist bot Oct 29, 2025

Uh oh!

gemini-code-assist bot Oct 29, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	TARBALL=$(ls onnxruntime-web-*.tgz \| head -n1)
	TARBALL=$(npm pack "onnxruntime-web@$VERSION" \| tail -n1)

V10 #13

V10 #13

Uh oh!

Conversation

cs-util commented Oct 29, 2025

Uh oh!

gemini-code-assist bot commented Oct 29, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants