- Rayhan Meghji
- Matthew McQuistion
Our project combines audio source separation targeting timbral differences with visual speaker identification, where both methods employ the Discrete Wavelet Transform.
Our model can be used on Hugging Face Spaces to process video and audio files either separately or together, or locally with:
python src/main.pyfor audiopython src/vision/main_vision.pyfor video.