Improving Emotional TTS with an Emotion Intensity Input from Unsupervised Extraction
Type of publication: Conference paper
Citation: Schnell_SSW11_2021
Publication status: Accepted
Booktitle: 11th ISCA Speech Synthesis Workshop
Year: 2021
Month: August
URL: https://www.idiap.ch/paper/ssw...
Abstract: We aim to provide controls for emotion in synthetic speech. Many emotions are not displayed continuously in an otherwise emotional utterance; rather, the intensity varies with time. We show that an emotion recogniser is capable of producing a measure of emotion intensity via attention or saliency; this measure is appropriate to label utterances subsequently used to train a speech synthesiser. We evaluate novel and published means to do this showing that, whilst it is no longer state of the art for emotion recognition, attention is a good way to indicate emotion intensity for speech synthesis.
Keywords: Emotion Recognition, emotional speech synthesis, Saliency Mapping, TTS
Projects NAST
Authors Schnell, Bastian
Garner, Philip N.
