Improving Emotional TTS with an Emotion Intensity Input from Unsupervised Extraction
| Type of publication: | Conference paper |
| Citation: | Schnell_SSW11_2021 |
| Publication status: | Accepted |
| Booktitle: | 11th ISCA Speech Synthesis Workshop |
| Year: | 2021 |
| Month: | August |
| URL: | https://www.idiap.ch/paper/ssw... |
| Abstract: | We aim to provide controls for emotion in synthetic speech. Many emotions are not displayed continuously in an otherwise emotional utterance; rather, the intensity varies with time. We show that an emotion recogniser is capable of producing a measure of emotion intensity via attention or saliency; this measure is appropriate to label utterances subsequently used to train a speech synthesiser. We evaluate novel and published means to do this showing that, whilst it is no longer state of the art for emotion recognition, attention is a good way to indicate emotion intensity for speech synthesis. |
| Keywords: | Emotion Recognition, emotional speech synthesis, Saliency Mapping, TTS |
| Projects: |
NAST Idiap |
| Authors: | |
| Added by: | [UNK] |
| Total mark: | 0 |
|
Attachments
|
|
|
Notes
|
|
|
|
|