Custom Local Training GUI is moved to DiffTrainer
- lab + wav (NNSVS format)
- csv + wav (DiffSinger format)
- ds (DiffSinger .ds files)
- your_speaker_folder's folder name will be used as spk_name so please be careful about your file naming
- colab notebook primarily uses python; thus space and special character in file name or folder path may be invalid
- for an in-depth guide for SVS training and/or labeling, please see SVS Singing Voice Database - Tutorial
- it is advised to edit your data using SlurCutter for a more refined data for your pitch model
- please visit DiffSinger Discord for any help and questions regarding model production
Zip file format examples:
[NOTE] .ds training has the same zip organization as lab + wav, but with only .ds files- no wav needed
#single speaker (lab + wav)
your_zip.zip:
|
|
your_speaker_folder:
|
|
data_1.wav
data_1.lab
.
data_2.wav
data_2.lab
.
data_3.wav
data_3.lab
.
...
#single speaker (csv + wav)
your_zip.zip:
|
|
your_speaker_folder:
|
|
wavs (folder named "wavs" containing all the wavs)
.
transcriptions.csv
#multi speaker (lab + wav)
your_zip.zip:
|
|
your_speaker_folder_1:
|
|
data_1.wav
data_1.lab
.
data_2.wav
data_2.lab
.
data_3.wav
data_3.lab
.
...
your_speaker_folder_2:
|
|
data_1.wav
data_1.lab
.
data_2.wav
data_2.lab
.
data_3.wav
data_3.lab
.
...
#multi speaker (csv + wav)
your_zip.zip:
|
|
your_speaker_folder_1:
|
|
wavs (folder named "wavs" containing all the wavs)
.
transcriptions.csv
your_speaker_folder_2:
|
|
wavs (folder named "wavs" containing all the wavs)
.
transcriptions.csv
- wav
- it is suggested to use manual segmented audio for cleaner segments (though there's minimal difference when using the auto segmentation)
- zip file format can consist of any type of files, even subfolders. data extraction will only account .wav that are within the zip into the training set
- lab + wav (NNSVS format)
- this notebook is still a rough draft, please either don't use it at all or use it with caution....
- [notebook] improve SOFA notebook, add inference
- [notebook] update dictionary conversion code for phoneme types in build OU
- [notebook] clean up multi-dict notebook and support logic for dictionary generating for out-of-spefied-lang labels (/)
- [resource] add example file(s) for multi-dicitonary training
Credits:
-
openvpi for DiffSinger fork and more
-
UtaUtaUtau for nnsvs-db-converter
-
Kei for the original notebook
-
MLo7 for the repo's content
-
PixPrucer for an in-depth SVS guide
-
haru0l for the base pretrain with embeds
-
AgentAsteriski for the local GUI