-
Notifications
You must be signed in to change notification settings - Fork 0
Update README for current conversion capabilities #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the README to accurately reflect the current state of the RWC voice conversion system, clarifying that file-based conversion is the stable functionality while live microphone support is experimental. The documentation is revised to set appropriate expectations around latency and system capabilities.
- Repositioned live microphone support as experimental Phase 1 rather than production-ready
- Updated performance expectations to reflect realistic 500-700ms latency for live pipeline
- Removed user-specific system details to make documentation more generic
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| - Inference (voice conversion): 2-5 minutes per 30 seconds of audio | ||
| - Real-time conversion: 30-50ms latency (live microphone input) | ||
| - File-based conversion: depends on model size and hardware; batch processing is recommended for throughput. |
Copilot
AI
Nov 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The description is somewhat vague. Consider adding approximate time ranges or examples (e.g., 'typically 2-5 minutes per 30 seconds of audio on GPU') to help users better understand expected performance, similar to the specificity provided in the removed content.
| - File-based conversion: depends on model size and hardware; batch processing is recommended for throughput. | |
| - File-based conversion: typically takes 2–5 minutes per 30 seconds of audio on a modern GPU (e.g., NVIDIA RTX 3060); actual performance depends on model size and hardware. Batch processing is recommended for throughput. |
| - Sample rate: 48000 Hz | ||
| - Max channels: 2 | ||
| - PyAudio and PortAudio are required for microphone capture. | ||
| - Device compatibility is unverified; expect limited hardware coverage during this phase. |
Copilot
AI
Nov 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The phrase 'Device compatibility is unverified' could be interpreted as no testing has been done. Consider rephrasing to something like 'Device compatibility may vary; comprehensive hardware testing is ongoing' to be more specific about the current state.
| - Device compatibility is unverified; expect limited hardware coverage during this phase. | |
| - Device compatibility may vary; comprehensive hardware testing is ongoing. Expect limited hardware coverage during this phase. |
Summary
Testing
Codex Task