Implement AV-Hubert unit pretraining task and UTUT pretraining/finetuning pipelines with associated data and configurations.
This project requires several large binary files that cannot be stored in this GitHub repository due to the 100MB file limit.
Please download them from Google Drive using gdown:
- Install gdown:
pip install gdown
After downloading, ensure the directory structure is:
AV2AV_granted_resources/utut_finetune/data/dataset_mbart_ft_bin_data/
├── (existing files...)
├── train.en-es.en.bin
└── train.en-es.es.bin
These files are limited to 100MB by Gitand must be manually downloaded before running the code.