CONF
Motlicek_INTERSPEECH2014_2014/IDIAP
Development of Bilingual ASR System for MediaParl Corpus
Motlicek, Petr
Imseng, David
Cernak, Milos
Kim, Namhoon
EXTERNAL
https://publications.idiap.ch/attachments/papers/2014/Motlicek_INTERSPEECH2014_2014.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/Motlicek_Idiap-RR-21-2014
Related documents
Proceedings of the 15th Annual Conference of the International Speech Communication Association (Interspeech 2014)
Singapore
2014
ISCA
The development of an Automatic Speech Recognition (ASR)
system for the bilingual MediaParl corpus is challenging for
several reasons: (1) reverberant recordings, (2) accented speech,
and (3) no prior information about the language. In that context,
we employ frequency domain linear prediction-based (FDLP)
features to reduce the effect of reverberation, exploit bilingual
deep neural networks applied in Tandem and hybrid acoustic modeling approaches to significantly improve ASR for accented speech and develop a fully bilingual ASR system using
entropy-based decoding-graph selection. Our experiments indicate that the proposed bilingual ASR system performs similar to
a language-specific ASR system if approximately five seconds
of speech are available.
REPORT
Motlicek_Idiap-RR-21-2014/IDIAP
Development of Bilingual ASR System for MediaParl Corpus
Motlicek, Petr
Imseng, David
Cernak, Milos
Kim, Namhoon
lan- guage identification
Multilingual automatic speech recognition
non-native speech
EXTERNAL
https://publications.idiap.ch/attachments/reports/2014/Motlicek_Idiap-RR-21-2014.pdf
PUBLIC
Idiap-RR-21-2014
2014
Idiap
Rue Marconi 19
December 2014
The development of an Automatic Speech Recognition (ASR)
system for the bilingual MediaParl corpus is challenging for
several reasons: (1) reverberant recordings, (2) accented speech,
and (3) no prior information about the language. In that context,
we employ frequency domain linear prediction-based (FDLP)
features to reduce the effect of reverberation, exploit bilingual
deep neural networks applied in Tandem and hybrid acoustic modeling approaches to significantly improve ASR for accented speech and develop a fully bilingual ASR system using
entropy-based decoding-graph selection. Our experiments indicate that the proposed bilingual ASR system performs similar to
a language-specific ASR system if approximately five seconds
of speech are available.