Acoustic model -- GradTTS

Installation

Firstly, install all Python package requirements:

pip install -r requirements.txt

Secondly, build monotonic_align code (Cython):

cd model/monotonic_align; python setup.py build_ext --inplace; cd ../..

Note: code is tested on Python==3.6.9.

Training

Make filelists of your audio data like ones included into resources/filelists folder. For single speaker training refer to jspeech filelists and to libri-tts filelists for multispeaker.
Set experiment configuration in params.py file.

Specify your GPU device and run training script:

export CUDA_VISIBLE_DEVICES=YOUR_GPU_ID
python train.py  # if single speaker
python train_multi_speaker.py  # if multispeaker

To track your training process run tensorboard server on any available port:
```
tensorboard --logdir=YOUR_LOG_DIR --port=8888
```
During training all logging information and checkpoints are stored in YOUR_LOG_DIR, which you can specify in params.py before training.

Inference

You can download Grad-TTS and HiFi-GAN checkpoints trained on LJSpeech* and Libri-TTS datasets (22kHz) from here.

Put necessary Grad-TTS into checkpts folder in root Grad-TTS directory.

Create text file with sentences you want to synthesize like resources/filelists/synthesis.txt.
Run script inference.py by providing path to the text file, path to the Grad-TTS checkpoint, number of iterations to be used for reverse diffusion (default: 10) and speaker id if you want to perform multispeaker inference:
```
python inference.py -f <your-text-file> -c <grad-tts-checkpoint> -t <number-of-timesteps> -s <speaker-id-if-multispeaker>
```
Check out folder called out for generated audios.

Pretrained checkpoint

Download pretrained GradTTS checkpoint here

Vocoder -- DiffWave

Install

Install using pip:

pip install diffwave

or from GitHub:

git clone https://github.com/lmnt-com/diffwave.git
cd diffwave
pip install .

Training

Before you start training, you'll need to prepare a training dataset. The dataset can have any directory structure as long as the contained .wav files are 16-bit mono.If you need to change the data process parameters, edit params.py.

python -m diffwave.preprocess /path/to/dir/containing/wavs
python -m diffwave /path/to/model/dir /path/to/dir/containing/wavs

# in another shell to monitor training progress:
tensorboard --logdir /path/to/model/dir --bind_all

You should expect to hear intelligible (but noisy) speech by ~8k steps (~1.5h on a 2080 Ti).

Inference API

Basic usage:

from diffwave.inference import predict as diffwave_predict

model_dir = '/path/to/model/dir'
spectrogram = # get your hands on a spectrogram in [N,C,W] format
audio, sample_rate = diffwave_predict(spectrogram, model_dir, fast_sampling=True)

# audio is a GPU tensor in [N,T] format.

Inference CLI

python -m diffwave.inference --fast /path/to/model /path/to/spectrogram -o output.wav

Batch inference

pip install .
python -m diffwave.batch_inference

Pretrained checkpoint

Download pretrained DiffWave checkpoint here

References

Monotonic Alignment Search algorithm is used for unsupervised duration modelling, official github repository: link.
Phonemization utilizes CMUdict, official github repository: link.
DiffWave: A Versatile Diffusion Model for Audio Synthesis
Denoising Diffusion Probabilistic Models
Code for Denoising Diffusion Probabilistic Models
Text-To-Speech Synthesis In The Wild

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
diffwave		diffwave
gradtts		gradtts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Acoustic model -- GradTTS

Installation

Training

Inference

Pretrained checkpoint

Vocoder -- DiffWave

Install

Training

Inference API

Inference CLI

Batch inference

Pretrained checkpoint

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

wildspoof/TTS_baselines

Folders and files

Latest commit

History

Repository files navigation

Acoustic model -- GradTTS

Installation

Training

Inference

Pretrained checkpoint

Vocoder -- DiffWave

Install

Training

Inference API

Inference CLI

Batch inference

Pretrained checkpoint

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages