Neural VTLN for Speaker Adaptation in TTS

Type of publication:	Conference paper
Citation:	Schnell_SSW10_2019
Publication status:	Published
Booktitle:	Proc. 10th ISCA Speech Synthesis Workshop
Year:	2019
Month:	September
Pages:	6
Location:	Vienna, Austria
Organization:	ISCA
DOI:	10.21437/ssw.2019-6
Abstract:	Vocal tract length normalisation (VTLN) is well established as a speaker adaptation technique that can work with very little adaptation data. It is also well known that VTLN can be cast as a linear transform in the cepstral domain. Building on this latter property, we show that it can be cast as a (linear) layer in a deep neural network (DNN) for speech synthesis. We show that VTLN parameters can then be trained in the same framework as the rest of the DNN using automatic gradients. Experimental results show that the DNN is capable of predicting phone-dependent warpings on artificial data, and that such warpings improve the quality of an acoustic model on real data in subjective listening tests.
Keywords:
Projects	MASS
Authors	Schnell, Bastian Garner, Philip N.
Added by:	[UNK]
Total mark:	0
Attachments
Schnell_SSW10_2019.pdf
Notes

processing time: 0.0003 seconds.