-
Notifications
You must be signed in to change notification settings - Fork 0
V7 #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… cross-origin isolation requirement Skip Invalid WASM Paths: Only set custom WASM paths if they're explicitly configured and not the default /ort/ (which doesn't exist) Add Error Handling: Added try-catch blocks around ONNX initialization to provide cleaner error messages instead of cryptic WASM failures
…DN for ONNX Runtime
… cross-origin isolation
…ings for cross-origin isolation
…single-threaded execution for cross-origin isolation
…ilability and InferenceSession
….create availability
…ve error handling for module availability
Summary of ChangesHello @cs-util, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a comprehensive client-side audio processing and chunking pipeline designed to enhance the efficiency and accuracy of audio transcription. By integrating Voice Activity Detection (VAD) and intelligent chunking strategies, the system can now process large audio files more effectively, optimize API calls, and leverage contextual prompts to improve transcription quality. These changes lay the groundwork for more robust and scalable speech-to-text capabilities within the application. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a comprehensive client-side speech-to-text (STT) pipeline. It adds functionality for voice activity detection (VAD), audio decoding and resampling, intelligent chunking of audio files, and sequential transcription with prompting to handle large audio files directly in the browser. This is a significant and well-structured feature addition. My review focuses on improving the robustness of the new shell script, fixing a logic issue in the ONNX runtime path configuration, simplifying the transcription orchestration by removing a misleading concurrency limiter, and improving code organization and maintainability in the chunking logic and tests. Overall, the implementation is impressive, and these changes will help ensure it is robust and easy to maintain.
| const limit = createLimiter(STT_CONFIG.uploadConcurrency); | ||
| const results = []; | ||
| let accumulatedText = ''; | ||
|
|
||
| for (const chunk of chunkFiles) { | ||
| const prompt = buildPromptFromTail(accumulatedText); | ||
| const task = limit(async () => { | ||
| const text = await transcribeFile({ | ||
| file: chunk.file, | ||
| language, | ||
| prompt, | ||
| }); | ||
| return { index: chunk.index, text }; | ||
| }); | ||
|
|
||
| const result = await task; | ||
| results.push(result); | ||
| accumulatedText = accumulatedText | ||
| ? `${accumulatedText}\n${result.text}` | ||
| : result.text; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of createLimiter here is misleading. Because the for...of loop contains an await for each chunk's transcription task, the chunks are processed sequentially, not concurrently. This is the correct behavior, as the transcription of chunk N+1 is prompted by the result of chunk N. However, this means the concurrency limiter is not actually providing any concurrency, and the uploadConcurrency configuration is unused. This adds unnecessary complexity and could be confusing for future maintenance. I recommend removing the limiter and simplifying the loop to reflect its sequential nature. This would also involve removing the createLimiter function (lines 8-31) and the uploadConcurrency setting from src/stt/config.js.
const results = [];
let accumulatedText = '';
for (const chunk of chunkFiles) {
const prompt = buildPromptFromTail(accumulatedText);
const text = await transcribeFile({
file: chunk.file,
language,
prompt,
});
const result = { index: chunk.index, text };
results.push(result);
accumulatedText = accumulatedText
? `${accumulatedText}\n${result.text}`
: result.text;
}| 1000 | ||
| ) | ||
| ); | ||
| const chunkDuration = Math.max(60_000, safeMax); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value 60_000 (1 minute) is a magic number. To improve maintainability and make the configuration more centralized, consider extracting this value into STT_CONFIG in src/stt/config.js (e.g., as fallbackMinChunkMs). A similar magic number 5 * 60_000 is used on line 127, which could also be extracted.
| const chunkDuration = Math.max(60_000, safeMax); | |
| const chunkDuration = Math.max(STT_CONFIG.fallbackMinChunkMs, safeMax); |
| describe('merge helpers', () => { | ||
| test('buildPromptFromTail trims tail characters', () => { | ||
| const prompt = buildPromptFromTail(' Example transcript text '); | ||
| expect(prompt.endsWith('text')).toBe(true); | ||
| }); | ||
|
|
||
| test('mergeChunkResults removes duplicate sentences', () => { | ||
| const merged = mergeChunkResults([ | ||
| { index: 0, text: 'Hello world. This is chunk one.' }, | ||
| { index: 1, text: 'This is chunk one. And here is more.' }, | ||
| ]); | ||
| expect(merged).toContain('Hello world.'); | ||
| expect(merged).toContain('And here is more.'); | ||
| expect(merged).not.toContain('This is chunk one.\nThis is chunk one.'); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
No description provided.