Skip to content

Conversation

@cs-util
Copy link
Contributor

@cs-util cs-util commented Oct 29, 2025

This pull request adds support for local ONNX Runtime (ORT) in development environments, including configuration for the ORT WebAssembly backend and related assets. It also updates ESLint and project ignore settings to accommodate the new ort directory. The most important changes are grouped below:

ORT Integration and Configuration:

  • Added a local copy of the ort-training-wasm-simd-threaded.mjs module for ONNX Runtime WebAssembly support, including logic for both browser and Node.js threaded environments. (ort/ort-training-wasm-simd-threaded.mjs)
  • Updated index.html to set default paths for ORT WebAssembly assets and the Silero VAD model, making local development easier by allowing overrides through global window properties. (index.html)

Project and Linting Configuration:

  • Updated .eslintignore to ignore the new ort/, node_modules/, and coverage/ directories, preventing linting and coverage tools from processing these files. (.eslintignore)
  • Updated ESLint configuration to exclude the ort/ directory from linting, both at the root and in parent directories. (config/eslint.config.js)

cs-util and others added 29 commits October 28, 2025 17:06
… cross-origin isolation requirement

Skip Invalid WASM Paths: Only set custom WASM paths if they're explicitly configured and not the default /ort/ (which doesn't exist)
Add Error Handling: Added try-catch blocks around ONNX initialization to provide cleaner error messages instead of cryptic WASM failures
…single-threaded execution for cross-origin isolation
…nt's flat config will exclude the local vendor files.

Recreated .eslintignore (removed markdown fences) and ensured it contains:
ort/
node_modules/
coverage/
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…cal model override in index.html (since the file doesn't exist), and disabled the custom WASM paths to ensure compatibility with the CDN-loaded ONNX Runtime Web
…the model URL being used and the exact error during session creation. This should help identify if the issue is with the model file, the runtime version, or the loading process
…tput frequency and update state tensor assignment
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @cs-util, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the application's audio transcription capabilities by integrating local ONNX Runtime WebAssembly for Voice Activity Detection (VAD) and intelligent audio chunking. It streamlines the process of preparing audio for transcription, moving beyond simple file uploads to a more robust, chunk-based approach that leverages VAD to identify speech segments. The changes also include necessary configuration updates for linting and local development, ensuring a smoother workflow for developers.

Highlights

  • ONNX Runtime Web (ORT) Integration: Added local copies of various ONNX Runtime WebAssembly modules and related assets to the ort/ directory, enabling local ORT support in development environments.
  • Development Environment Configuration: Updated index.html to set default paths for ORT WebAssembly assets and the Silero VAD model, simplifying local development by allowing overrides via global window properties.
  • Linting and Project Ignore Settings: Modified .eslintignore and config/eslint.config.js to exclude the new ort/ directory, node_modules/, and coverage/ from linting and coverage processing.
  • Advanced Audio Transcription Logic: Introduced new modules (src/stt/audio.js, src/stt/chunking.js, src/stt/config.js, src/stt/merge.js, src/stt/transcriber.js, src/stt/vad.js) to implement sophisticated audio processing, Voice Activity Detection (VAD), and intelligent chunking for improved transcription.
  • OpenAI Transcription API Enhancement: The transcribeFile function in src/openai.js now accepts an optional prompt parameter, allowing for more contextual transcription requests.
  • Automated ONNX Runtime Asset Management: Added a new shell script (scripts/fetch-onnx-dist.sh) and its documentation (scripts/README.md) to automate the downloading and placement of onnxruntime-web distribution assets.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive and robust client-side speech-to-text (STT) pipeline, including local ONNX Runtime (ORT) for voice activity detection (VAD), audio chunking, and transcription logic. The architecture is well-designed, with thoughtful fallbacks for potential failures in audio decoding or VAD. The addition of a script to fetch local ORT assets is also a great improvement for the development environment. I've identified a critical issue where the VAD module still loads the ONNX Runtime from a CDN instead of the newly added local files, which undermines a key goal of this PR. I've also included a few medium-severity suggestions to improve script robustness and code organization. Overall, this is a very impressive and significant feature addition.

Comment on lines +12 to +74
function ensureOrt() {
if (!ortPromise) {
// Try multiple import sources. Some CDNs or esm transforms wrap the
// real export under `default` or produce incomplete modules. Try
// esm.sh first (fast), then fall back to known CDN ESM builds.
ortPromise = (async () => {
const candidates = [
'https://esm.sh/onnxruntime-web@1.19.0',
'https://cdn.jsdelivr.net/npm/onnxruntime-web@1.19.0/dist/ort.esm.js',
'https://unpkg.com/onnxruntime-web@1.19.0/dist/ort.esm.js',
];

let lastError = null;
for (const url of candidates) {
try {
const module = await import(url);
const ort = module?.default || module;
if (!ort) throw new Error('empty module');

// basic sanity: must expose InferenceSession.create
if (
!ort.InferenceSession ||
typeof ort.InferenceSession.create !== 'function'
) {
throw new Error(
'incomplete ort module (missing InferenceSession.create)'
);
}

// configure WASM loader (single-threaded to avoid COOP/COEP)
if (ort?.env?.wasm) {
ort.env.wasm.numThreads = 1;
ort.env.wasm.wasmPaths = DEFAULT_ORT_WASM_PATH;
}

// diagnostic: report which URL produced a usable ort
try {
console.info('Loaded ONNX Runtime Web from', url);
console.log('ORT object keys:', Object.keys(ort));
console.log('InferenceSession:', ort.InferenceSession);
console.log(
'create function:',
typeof ort.InferenceSession?.create
);
} catch {
/* ignore */
}
return ort;
} catch (err) {
// try next candidate
lastError = err;
}
}

console.warn(
'Failed to load ONNX Runtime Web from CDN candidates',
lastError
);
throw new Error('ONNX Runtime Web not available');
})();
}
return ortPromise;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This function currently loads the ONNX Runtime (ORT) module from public CDNs. A major goal of this pull request is to vendor these assets locally to improve reliability and avoid cross-origin issues, as evidenced by the new /ort directory and the fetch-onnx-dist.sh script. Loading from a CDN defeats this purpose and re-introduces an external dependency.

To align with the PR's goal, you should import the ORT module from the local /ort/ path. This will ensure the application uses the vendored files.

import ort from '/ort/ort.wasm.min.mjs';

/* istanbul ignore next -- runtime depends on onnxruntime-web in browser */
function ensureOrt() {
  if (!ortPromise) {
    ortPromise = (async () => {
      if (!ort || typeof ort.InferenceSession?.create !== 'function') {
        throw new Error('Failed to import onnxruntime-web from local path');
      }

      // configure WASM loader (single-threaded to avoid COOP/COEP)
      if (ort?.env?.wasm) {
        ort.env.wasm.numThreads = 1;
        ort.env.wasm.wasmPaths = DEFAULT_ORT_WASM_PATH;
      }

      return ort;
    })();
  }
  return ortPromise;
}

Comment on lines +15 to +16
'../ort/**',
'ort/**',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While not a functional issue, ignoring the ort/ directory is also handled by .eslintignore. For consistency and to have a single source of truth for top-level ignored directories, you might consider removing these lines and relying solely on .eslintignore. However, keeping it here also works, as ESLint will respect both.

# Use npm pack to download the package tarball
echo "Downloading npm package..."
npm pack "onnxruntime-web@$VERSION" >/dev/null 2>&1
TARBALL=$(ls onnxruntime-web-*.tgz | head -n1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using ls and head to get the tarball filename can be fragile. If there are other .tgz files in the temporary directory for any reason, this might pick the wrong one. A more robust approach is to capture the filename directly from the output of npm pack, which prints the filename of the created tarball on the last line.

Suggested change
TARBALL=$(ls onnxruntime-web-*.tgz | head -n1)
TARBALL=$(npm pack "onnxruntime-web@$VERSION" | tail -n1)

Comment on lines +78 to +93
describe('merge helpers', () => {
test('buildPromptFromTail trims tail characters', () => {
const prompt = buildPromptFromTail(' Example transcript text ');
expect(prompt.endsWith('text')).toBe(true);
});

test('mergeChunkResults removes duplicate sentences', () => {
const merged = mergeChunkResults([
{ index: 0, text: 'Hello world. This is chunk one.' },
{ index: 1, text: 'This is chunk one. And here is more.' },
]);
expect(merged).toContain('Hello world.');
expect(merged).toContain('And here is more.');
expect(merged).not.toContain('This is chunk one.\nThis is chunk one.');
});
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These tests for merge.js helpers are currently located in the test file for chunking.js. For better organization and maintainability, it would be ideal to move these tests to a dedicated src/stt/merge.test.js file.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +274 to +278
const probability = extractSpeechProbability(results);
probabilities.push(typeof probability === 'number' ? probability : 0);

stateTensor = results.stateN || stateTensor;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Update VAD recurrent state from correct output key

detectSpeechSegments assigns stateTensor = results.stateN || stateTensor; to carry the Silero VAD RNN state forward. The exported model exposes the next state as state/state_out and does not include stateN, so this assignment always falls back to the original zeroed state. Each frame is therefore evaluated as if it were the first frame, breaking speech continuity and causing long utterances to be fragmented or missed. Read the returned state from the actual output key before feeding it back into the next iteration.

Useful? React with 👍 / 👎.

@cs-util cs-util merged commit 9ceb601 into main Oct 31, 2025
6 checks passed
@cs-util cs-util deleted the v10 branch October 31, 2025 03:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants