Skip to content

wildspoof/TTS_baselines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Acoustic model -- GradTTS

Installation

Firstly, install all Python package requirements:

pip install -r requirements.txt

Secondly, build monotonic_align code (Cython):

cd model/monotonic_align; python setup.py build_ext --inplace; cd ../..

Note: code is tested on Python==3.6.9.

Training

  1. Make filelists of your audio data like ones included into resources/filelists folder. For single speaker training refer to jspeech filelists and to libri-tts filelists for multispeaker.
  2. Set experiment configuration in params.py file.
  3. Specify your GPU device and run training script:
    export CUDA_VISIBLE_DEVICES=YOUR_GPU_ID
    python train.py  # if single speaker
    python train_multi_speaker.py  # if multispeaker
  4. To track your training process run tensorboard server on any available port:
    tensorboard --logdir=YOUR_LOG_DIR --port=8888
    During training all logging information and checkpoints are stored in YOUR_LOG_DIR, which you can specify in params.py before training.

Inference

You can download Grad-TTS and HiFi-GAN checkpoints trained on LJSpeech* and Libri-TTS datasets (22kHz) from here.

Put necessary Grad-TTS into checkpts folder in root Grad-TTS directory.

  1. Create text file with sentences you want to synthesize like resources/filelists/synthesis.txt.
  2. Run script inference.py by providing path to the text file, path to the Grad-TTS checkpoint, number of iterations to be used for reverse diffusion (default: 10) and speaker id if you want to perform multispeaker inference:
    python inference.py -f <your-text-file> -c <grad-tts-checkpoint> -t <number-of-timesteps> -s <speaker-id-if-multispeaker>
  3. Check out folder called out for generated audios.

Pretrained checkpoint

Download pretrained GradTTS checkpoint here

Vocoder -- DiffWave

Install

Install using pip:

pip install diffwave

or from GitHub:

git clone https://github.com/lmnt-com/diffwave.git
cd diffwave
pip install .

Training

Before you start training, you'll need to prepare a training dataset. The dataset can have any directory structure as long as the contained .wav files are 16-bit mono.If you need to change the data process parameters, edit params.py.

python -m diffwave.preprocess /path/to/dir/containing/wavs
python -m diffwave /path/to/model/dir /path/to/dir/containing/wavs

# in another shell to monitor training progress:
tensorboard --logdir /path/to/model/dir --bind_all

You should expect to hear intelligible (but noisy) speech by ~8k steps (~1.5h on a 2080 Ti).

Inference API

Basic usage:

from diffwave.inference import predict as diffwave_predict

model_dir = '/path/to/model/dir'
spectrogram = # get your hands on a spectrogram in [N,C,W] format
audio, sample_rate = diffwave_predict(spectrogram, model_dir, fast_sampling=True)

# audio is a GPU tensor in [N,T] format.

Inference CLI

python -m diffwave.inference --fast /path/to/model /path/to/spectrogram -o output.wav

Batch inference

pip install .
python -m diffwave.batch_inference

Pretrained checkpoint

Download pretrained DiffWave checkpoint here

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors