Template-based ASR using Posterior features and synthetic references: comparing different TTS systems

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Conference paper
Citation:	Soldo_SAPA-SCALE_2012
Publication status:	Accepted
Booktitle:	SAPA-SCALE Conference, International Speech Communication Association
Year:	2012
Month:	September
Abstract:	In recent works, the use of phone class-conditional posterior probabilities (posterior features) directly as features provided successful results in template-based ASR systems. In this paper, motivated by the high quality of current text-to-speech systems and the robustness of posterior features toward undesired variability, we investigate the use of synthetic speech to generate reference templates. The use of synthetic speech in template-based ASR not only allows to address the issue of in-domain data collection but also expansion of vocabulary. On 75- and 600-word task-independent and speaker-independent setup of Phonebook corpus, we show the feasibility of this approach by investigating different synthetic voices produced by HTS-based synthesizer trained on two different databases. Our study shows that synthetic speech templates can yield performance comparable to the natural speech templates, especially with synthetic voices that have high intelligibility.
Keywords:	Posterior features, speech recognition, synthetic reference templates., template-based approach
Projects	Idiap FP 7
Authors	Soldo, Serena Magimai-Doss, Mathew Bourlard, Hervé
Added by:	[UNK]
Total mark:	0
Attachments
Soldo_SAPA-SCALE_2012.pdf
Notes

processing time: 0.0013 seconds.