Skip to content

Conversation

@cs-util
Copy link
Contributor

@cs-util cs-util commented Oct 29, 2025

No description provided.

… cross-origin isolation requirement

Skip Invalid WASM Paths: Only set custom WASM paths if they're explicitly configured and not the default /ort/ (which doesn't exist)
Add Error Handling: Added try-catch blocks around ONNX initialization to provide cleaner error messages instead of cryptic WASM failures
…single-threaded execution for cross-origin isolation
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @cs-util, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive client-side audio processing and chunking pipeline designed to enhance the efficiency and accuracy of audio transcription. By integrating Voice Activity Detection (VAD) and intelligent chunking strategies, the system can now process large audio files more effectively, optimize API calls, and leverage contextual prompts to improve transcription quality. These changes lay the groundwork for more robust and scalable speech-to-text capabilities within the application.

Highlights

  • ONNX Runtime Web Asset Management: A new script scripts/fetch-onnx-dist.sh and its README.md have been added to simplify downloading and managing ONNX Runtime Web assets locally, ensuring proper WASM loading and avoiding cross-origin issues.
  • Advanced Audio Processing Utilities: New utilities in src/stt/audio.js provide robust client-side audio manipulation, including decoding various audio formats to mono 16kHz PCM, resampling, and encoding PCM data into WAV blobs for transcription.
  • Intelligent Audio Chunking Logic: The src/stt/chunking.js module introduces sophisticated logic for segmenting audio. It normalizes speech segments, packs them into manageable chunks while respecting size and duration limits, and applies overlaps to ensure seamless transcription across chunk boundaries.
  • Voice Activity Detection (VAD) Integration: A Voice Activity Detection (VAD) module (src/stt/vad.js) has been integrated, leveraging ONNX Runtime Web to identify speech segments within audio. This allows for more efficient and targeted transcription by focusing on spoken parts.
  • Orchestrated Chunked Transcription: The src/stt/transcriber.js module now orchestrates the entire chunked transcription process. It handles audio decoding, VAD-based segment detection, chunk planning, concurrent API calls, and intelligent merging of results, including a fallback to byte-based chunking if VAD or audio decoding fails.
  • Contextual Transcription Prompts: The transcribeFile function in src/openai.js has been updated to accept an optional prompt parameter. This enables the chunked transcription process to provide contextual information from previous chunks, improving the accuracy and coherence of transcriptions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive client-side speech-to-text (STT) pipeline. It adds functionality for voice activity detection (VAD), audio decoding and resampling, intelligent chunking of audio files, and sequential transcription with prompting to handle large audio files directly in the browser. This is a significant and well-structured feature addition. My review focuses on improving the robustness of the new shell script, fixing a logic issue in the ONNX runtime path configuration, simplifying the transcription orchestration by removing a misleading concurrency limiter, and improving code organization and maintainability in the chunking logic and tests. Overall, the implementation is impressive, and these changes will help ensure it is robust and easy to maintain.

Comment on lines +115 to +135
const limit = createLimiter(STT_CONFIG.uploadConcurrency);
const results = [];
let accumulatedText = '';

for (const chunk of chunkFiles) {
const prompt = buildPromptFromTail(accumulatedText);
const task = limit(async () => {
const text = await transcribeFile({
file: chunk.file,
language,
prompt,
});
return { index: chunk.index, text };
});

const result = await task;
results.push(result);
accumulatedText = accumulatedText
? `${accumulatedText}\n${result.text}`
: result.text;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of createLimiter here is misleading. Because the for...of loop contains an await for each chunk's transcription task, the chunks are processed sequentially, not concurrently. This is the correct behavior, as the transcription of chunk N+1 is prompted by the result of chunk N. However, this means the concurrency limiter is not actually providing any concurrency, and the uploadConcurrency configuration is unused. This adds unnecessary complexity and could be confusing for future maintenance. I recommend removing the limiter and simplifying the loop to reflect its sequential nature. This would also involve removing the createLimiter function (lines 8-31) and the uploadConcurrency setting from src/stt/config.js.

  const results = [];
  let accumulatedText = '';

  for (const chunk of chunkFiles) {
    const prompt = buildPromptFromTail(accumulatedText);
    const text = await transcribeFile({
      file: chunk.file,
      language,
      prompt,
    });
    const result = { index: chunk.index, text };
    results.push(result);
    accumulatedText = accumulatedText
      ? `${accumulatedText}\n${result.text}`
      : result.text;
  }

1000
)
);
const chunkDuration = Math.max(60_000, safeMax);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The value 60_000 (1 minute) is a magic number. To improve maintainability and make the configuration more centralized, consider extracting this value into STT_CONFIG in src/stt/config.js (e.g., as fallbackMinChunkMs). A similar magic number 5 * 60_000 is used on line 127, which could also be extracted.

Suggested change
const chunkDuration = Math.max(60_000, safeMax);
const chunkDuration = Math.max(STT_CONFIG.fallbackMinChunkMs, safeMax);

Comment on lines +78 to +93
describe('merge helpers', () => {
test('buildPromptFromTail trims tail characters', () => {
const prompt = buildPromptFromTail(' Example transcript text ');
expect(prompt.endsWith('text')).toBe(true);
});

test('mergeChunkResults removes duplicate sentences', () => {
const merged = mergeChunkResults([
{ index: 0, text: 'Hello world. This is chunk one.' },
{ index: 1, text: 'This is chunk one. And here is more.' },
]);
expect(merged).toContain('Hello world.');
expect(merged).toContain('And here is more.');
expect(merged).not.toContain('This is chunk one.\nThis is chunk one.');
});
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better project organization and to make tests easier to find, the tests for merge.js helpers should be in their own test file, src/stt/merge.test.js, instead of being included in chunking.test.js.

cs-util and others added 3 commits October 29, 2025 10:48
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@cs-util cs-util merged commit 38984d0 into main Oct 31, 2025
6 checks passed
@cs-util cs-util deleted the v7 branch October 31, 2025 03:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants