CONF Motlicek_INTERSPEECH2014_2014/IDIAP Development of Bilingual ASR System for MediaParl Corpus Motlicek, Petr Imseng, David Cernak, Milos Kim, Namhoon EXTERNAL https://publications.idiap.ch/attachments/papers/2014/Motlicek_INTERSPEECH2014_2014.pdf PUBLIC https://publications.idiap.ch/index.php/publications/showcite/Motlicek_Idiap-RR-21-2014 Related documents Proceedings of the 15th Annual Conference of the International Speech Communication Association (Interspeech 2014) Singapore 2014 ISCA The development of an Automatic Speech Recognition (ASR) system for the bilingual MediaParl corpus is challenging for several reasons: (1) reverberant recordings, (2) accented speech, and (3) no prior information about the language. In that context, we employ frequency domain linear prediction-based (FDLP) features to reduce the effect of reverberation, exploit bilingual deep neural networks applied in Tandem and hybrid acoustic modeling approaches to significantly improve ASR for accented speech and develop a fully bilingual ASR system using entropy-based decoding-graph selection. Our experiments indicate that the proposed bilingual ASR system performs similar to a language-specific ASR system if approximately five seconds of speech are available. REPORT Motlicek_Idiap-RR-21-2014/IDIAP Development of Bilingual ASR System for MediaParl Corpus Motlicek, Petr Imseng, David Cernak, Milos Kim, Namhoon lan- guage identification Multilingual automatic speech recognition non-native speech EXTERNAL https://publications.idiap.ch/attachments/reports/2014/Motlicek_Idiap-RR-21-2014.pdf PUBLIC Idiap-RR-21-2014 2014 Idiap Rue Marconi 19 December 2014 The development of an Automatic Speech Recognition (ASR) system for the bilingual MediaParl corpus is challenging for several reasons: (1) reverberant recordings, (2) accented speech, and (3) no prior information about the language. In that context, we employ frequency domain linear prediction-based (FDLP) features to reduce the effect of reverberation, exploit bilingual deep neural networks applied in Tandem and hybrid acoustic modeling approaches to significantly improve ASR for accented speech and develop a fully bilingual ASR system using entropy-based decoding-graph selection. Our experiments indicate that the proposed bilingual ASR system performs similar to a language-specific ASR system if approximately five seconds of speech are available.