Does the TTS model have instruction controls on top of voice cloning?
Or, can it change emotions based on the context of the text? For example, can we clone a voice and then instruct it to sound angry, happy, shout, whisper, etc.? Or does it automatically adjust these based on the text?
Does it support non-verbal sounds like laughter or other paralinguistic elements?
Thanks!