logo Idiap Research Institute        
 [BibTeX] [Marc21]
Investigating a neural all pass warp in modern TTS applications
Type of publication: Journal paper
Citation: Schnell_SPECOM_2022
Publication status: Published
Journal: Speech Communication
Volume: 138
Year: 2022
Month: March
Pages: 26--37
Note: Open Access
DOI: 10.1016/j.specom.2021.12.002
Abstract: We present a neural implementation of the all pass warp (APW) previously used for vocal tract length normalisation. This includes an efficient back-propagation, which can easily be integrated in modern neural network frameworks. The APW offers a low-dimensional control to alter the spectrum, which by design generalises over different speakers. We investigate the APW in two tasks required for future dialogue or translation agents, and provide a fairly thorough literature review for both: (1) Zero-shot speaker adaptation to allow keeping the source speaker identity with very small amounts of data. Experiments show increased speaker similarity and prove that the APW increases the generalisability of a multi-speaker model. (2) Emotional speech synthesis to translate or produce affective cues. To the best of our knowledge this is the first attempt on emotional speech synthesis with an APW. While the APW is not able to increase expressiveness or audio quality, our analysis shows that the warping correlates with the level of valence in the emotion. This work should enable future research on emotion translation during machine translation.
Projects NAST
Authors Schnell, Bastian
Garner, Philip N.
Added by: [UNK]
Total mark: 0
  • Schnell_SPECOM_2022.pdf