logo Idiap Research Institute        
 [BibTeX] [Marc21]
Vocal Tract Length Normalization for Statistical Parametric Speech Synthesis
Type of publication: Journal paper
Citation: Saheer_IEEE-TASL_2012
Publication status: Accepted
Journal: IEEE Transactions on Audio, Speech and Language Processing
Year: 2012
Abstract: Vocal tract length normalization (VTLN) has been successfully used in automatic speech recognition for improved performance. The same technique can be implemented in statistical parametric speech synthesis for rapid speaker adaptation during synthesis. This paper presents an efficient implementation of VTLN using expectation maximization and addresses the key challenges faced in implementing VTLN for synthesis. Jacobian normalization, high dimensionality features and truncation of the transformation matrix are a few challenges presented with the appropriate solutions. Detailed evaluations are performed to estimate the most suitable technique for using VTLN in speech synthesis. Evaluating VTLN in the framework of speech synthesis is also not an easy task since the technique does not work equally well for all speakers. Speakers have been selected based on different objective and subjective criteria to demonstrate the difference between systems. The best method for implementing VTLN is confirmed to be use of the lower order features for estimating warping factors.
Projects Idiap
Authors Saheer, Lakshmi
Dines, John
Garner, Philip N.
Added by: [UNK]
Total mark: 0
  • Saheer_IEEE-TASL_2012.pdf