V7 #10

cs-util · 2025-10-29T09:27:48Z

No description provided.

… cross-origin isolation requirement Skip Invalid WASM Paths: Only set custom WASM paths if they're explicitly configured and not the default /ort/ (which doesn't exist) Add Error Handling: Added try-catch blocks around ONNX initialization to provide cleaner error messages instead of cryptic WASM failures

…DN for ONNX Runtime

… cross-origin isolation

…ings for cross-origin isolation

…single-threaded execution for cross-origin isolation

…ilability and InferenceSession

….create availability

…ve error handling for module availability

gemini-code-assist · 2025-10-29T09:28:10Z

Summary of Changes

Hello @cs-util, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive client-side audio processing and chunking pipeline designed to enhance the efficiency and accuracy of audio transcription. By integrating Voice Activity Detection (VAD) and intelligent chunking strategies, the system can now process large audio files more effectively, optimize API calls, and leverage contextual prompts to improve transcription quality. These changes lay the groundwork for more robust and scalable speech-to-text capabilities within the application.

Highlights

ONNX Runtime Web Asset Management: A new script scripts/fetch-onnx-dist.sh and its README.md have been added to simplify downloading and managing ONNX Runtime Web assets locally, ensuring proper WASM loading and avoiding cross-origin issues.
Advanced Audio Processing Utilities: New utilities in src/stt/audio.js provide robust client-side audio manipulation, including decoding various audio formats to mono 16kHz PCM, resampling, and encoding PCM data into WAV blobs for transcription.
Intelligent Audio Chunking Logic: The src/stt/chunking.js module introduces sophisticated logic for segmenting audio. It normalizes speech segments, packs them into manageable chunks while respecting size and duration limits, and applies overlaps to ensure seamless transcription across chunk boundaries.
Voice Activity Detection (VAD) Integration: A Voice Activity Detection (VAD) module (src/stt/vad.js) has been integrated, leveraging ONNX Runtime Web to identify speech segments within audio. This allows for more efficient and targeted transcription by focusing on spoken parts.
Orchestrated Chunked Transcription: The src/stt/transcriber.js module now orchestrates the entire chunked transcription process. It handles audio decoding, VAD-based segment detection, chunk planning, concurrent API calls, and intelligent merging of results, including a fallback to byte-based chunking if VAD or audio decoding fails.
Contextual Transcription Prompts: The transcribeFile function in src/openai.js has been updated to accept an optional prompt parameter. This enables the chunked transcription process to provide contextual information from previous chunks, improving the accuracy and coherence of transcriptions.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a comprehensive client-side speech-to-text (STT) pipeline. It adds functionality for voice activity detection (VAD), audio decoding and resampling, intelligent chunking of audio files, and sequential transcription with prompting to handle large audio files directly in the browser. This is a significant and well-structured feature addition. My review focuses on improving the robustness of the new shell script, fixing a logic issue in the ONNX runtime path configuration, simplifying the transcription orchestration by removing a misleading concurrency limiter, and improving code organization and maintainability in the chunking logic and tests. Overall, the implementation is impressive, and these changes will help ensure it is robust and easy to maintain.

gemini-code-assist · 2025-10-29T09:30:39Z

src/stt/transcriber.js

+  const limit = createLimiter(STT_CONFIG.uploadConcurrency);
+  const results = [];
+  let accumulatedText = '';
+
+  for (const chunk of chunkFiles) {
+    const prompt = buildPromptFromTail(accumulatedText);
+    const task = limit(async () => {
+      const text = await transcribeFile({
+        file: chunk.file,
+        language,
+        prompt,
+      });
+      return { index: chunk.index, text };
+    });
+
+    const result = await task;
+    results.push(result);
+    accumulatedText = accumulatedText
+      ? `${accumulatedText}\n${result.text}`
+      : result.text;
+  }


The use of createLimiter here is misleading. Because the for...of loop contains an await for each chunk's transcription task, the chunks are processed sequentially, not concurrently. This is the correct behavior, as the transcription of chunk N+1 is prompted by the result of chunk N. However, this means the concurrency limiter is not actually providing any concurrency, and the uploadConcurrency configuration is unused. This adds unnecessary complexity and could be confusing for future maintenance. I recommend removing the limiter and simplifying the loop to reflect its sequential nature. This would also involve removing the createLimiter function (lines 8-31) and the uploadConcurrency setting from src/stt/config.js.

const results = []; let accumulatedText = ''; for (const chunk of chunkFiles) { const prompt = buildPromptFromTail(accumulatedText); const text = await transcribeFile({ file: chunk.file, language, prompt, }); const result = { index: chunk.index, text }; results.push(result); accumulatedText = accumulatedText ? `${accumulatedText}\n${result.text}` : result.text; }

src/stt/vad.js

scripts/fetch-onnx-dist.sh

gemini-code-assist · 2025-10-29T09:30:40Z

src/stt/chunking.js

+          1000
+      )
+    );
+    const chunkDuration = Math.max(60_000, safeMax);


The value 60_000 (1 minute) is a magic number. To improve maintainability and make the configuration more centralized, consider extracting this value into STT_CONFIG in src/stt/config.js (e.g., as fallbackMinChunkMs). A similar magic number 5 * 60_000 is used on line 127, which could also be extracted.

Suggested change

const chunkDuration = Math.max(60_000, safeMax);

const chunkDuration = Math.max(STT_CONFIG.fallbackMinChunkMs, safeMax);

gemini-code-assist · 2025-10-29T09:30:40Z

src/stt/chunking.test.js

+describe('merge helpers', () => {
+  test('buildPromptFromTail trims tail characters', () => {
+    const prompt = buildPromptFromTail('  Example transcript text  ');
+    expect(prompt.endsWith('text')).toBe(true);
+  });
+
+  test('mergeChunkResults removes duplicate sentences', () => {
+    const merged = mergeChunkResults([
+      { index: 0, text: 'Hello world. This is chunk one.' },
+      { index: 1, text: 'This is chunk one. And here is more.' },
+    ]);
+    expect(merged).toContain('Hello world.');
+    expect(merged).toContain('And here is more.');
+    expect(merged).not.toContain('This is chunk one.\nThis is chunk one.');
+  });
+});


For better project organization and to make tests easier to find, the tests for merge.js helpers should be in their own test file, src/stt/merge.test.js, instead of being included in chunking.test.js.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

cs-util added 11 commits October 28, 2025 17:06

Implement client-side chunked transcription pipeline

2973ce7

Refactor chunking tests for clarity and accuracy; update VAD to use C…

df5b6e4

…DN for ONNX Runtime

Update ONNX Runtime import to use CDN and disable multi-threading for…

366ee0b

… cross-origin isolation

Update ONNX Runtime import to use ESM and adjust multi-threading sett…

f0678bb

…ings for cross-origin isolation

Refactor ONNX Runtime import to simplify promise handling and ensure …

21a646c

…single-threaded execution for cross-origin isolation

Enhance error handling in ONNX Runtime import to check for module ava…

c129850

…ilability and InferenceSession

Enhance error handling in ensureSession to check for InferenceSession…

cfd4646

….create availability

Enhance ONNX Runtime import to support multiple CDN sources and impro…

545df4a

…ve error handling for module availability

Add diagnostic logging for ONNX Runtime Web URL loading

2eecbad

Add fetch script for ONNX Runtime Web distribution assets

a5d0bff

gemini-code-assist bot reviewed Oct 29, 2025

View reviewed changes

cs-util and others added 3 commits October 29, 2025 10:48

Update src/stt/vad.js

979831a

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update scripts/fetch-onnx-dist.sh

e8658ab

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update scripts/fetch-onnx-dist.sh

b2de2cd

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

cs-util merged commit 38984d0 into main Oct 31, 2025
6 checks passed

cs-util deleted the v7 branch October 31, 2025 03:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

V7 #10

V7 #10

Uh oh!

cs-util commented Oct 29, 2025

Uh oh!

gemini-code-assist bot commented Oct 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Oct 29, 2025

Uh oh!

gemini-code-assist bot Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	const chunkDuration = Math.max(60_000, safeMax);
	const chunkDuration = Math.max(STT_CONFIG.fallbackMinChunkMs, safeMax);

V7 #10

V7 #10

Uh oh!

Conversation

cs-util commented Oct 29, 2025

Uh oh!

gemini-code-assist bot commented Oct 29, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants