Improving Emotional TTS with an Emotion Intensity Input from Unsupervised Extraction
Type of publication: | Conference paper |
Citation: | Schnell_SSW11_2021 |
Publication status: | Accepted |
Booktitle: | 11th ISCA Speech Synthesis Workshop |
Year: | 2021 |
Month: | August |
URL: | https://www.idiap.ch/paper/ssw... |
Abstract: | We aim to provide controls for emotion in synthetic speech. Many emotions are not displayed continuously in an otherwise emotional utterance; rather, the intensity varies with time. We show that an emotion recogniser is capable of producing a measure of emotion intensity via attention or saliency; this measure is appropriate to label utterances subsequently used to train a speech synthesiser. We evaluate novel and published means to do this showing that, whilst it is no longer state of the art for emotion recognition, attention is a good way to indicate emotion intensity for speech synthesis. |
Keywords: | Emotion Recognition, emotional speech synthesis, Saliency Mapping, TTS |
Projects |
NAST Idiap |
Authors | |
Added by: | [UNK] |
Total mark: | 0 |
Attachments
|
|
Notes
|
|
|