End-to-End Accented Speech Recognition

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Conference paper
Citation:	Viglino_INTERSPEECH_2019
Publication status:	Published
Booktitle:	International Conference on Speech and Language Processing, Interspeech
Year:	2019
Month:	September
Pages:	2140-2144
Location:	Graz, Austria
Organization:	ISCA
Crossref:	Viglino_Idiap-RR-04-2022: End-to-end Accented Speech Recognition, Viglino, Thibault, Motlicek, Petr and Cernak, Milos, Idiap-RR-04-2022
DOI:	10.21437
Abstract:	Correct pronunciation is known to be the most difficult part to acquire for (native or non-native) language learners. The accented speech is thus more variable, and standard Automatic Speech Recognition (ASR) training approaches that rely on intermediate phone alignment might introduce errors during the ASR training. With end-to-end training we could alleviate this problem. In this work, we explore the use of multi-task training and accent embedding in the context of end-to-end ASR trained with the connectionist temporal classification loss. Comparing to the baseline developed using conventional ASR framework exploiting time-delay neural networks trained on accented English, we show significant relative improvement of about 25% in word error rate. Additional evaluation on unseen accent data yields relative improvements of of 31% and 2% for New Zealand English and Indian English, respectively.
Keywords:
Projects	Idiap CTI-Shaped
Authors	Viglino, Thibault Motlicek, Petr Cernak, Milos
Added by:	[UNK]
Total mark:	0
Attachments
Viglino_INTERSPEECH_2019.pdf
Notes

processing time: 0.0008 seconds.