CONF
Imseng_INTERSPEECH-2_2010/IDIAP
Towards mixed language speech recognition systems
Imseng, David
Bourlard, Hervé
Magimai-Doss, Mathew
EXTERNAL
https://publications.idiap.ch/attachments/papers/2010/Imseng_INTERSPEECH-2_2010.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/Imseng_Idiap-RR-15-2010
Related documents
Proceedings of Interspeech
Makuhari, Japan
2010
September 2010
278-281
Multilingual speech recognition obviously involves numerous research challenges, including common phoneme sets, adaptation on limited amount of training data, as well as mixed language recognition (common in many countries, like Switzerland). In this latter case, it is not even possible to assume that one knows in advance the language being spoken. This is the context and motivation of the present work. We indeed investigate how current state-of-the-art speech recognition systems can be exploited in multilingual environments, where the language (from an assumed set of five possible languages, in our case) is not a priori known during recognition. We combine monolingual systems and extensively develop and compare different features and acoustic models. On SpeechDat(II) datasets, and in the context of isolated words, we show that it is actually possible to approach the performances of monolingual systems even if the identity of the spoken language is not a priori known.
REPORT
Imseng_Idiap-RR-15-2010/IDIAP
Towards mixed language speech recognition systems
Imseng, David
Bourlard, Hervé
Magimai-Doss, Mathew
EXTERNAL
https://publications.idiap.ch/attachments/reports/2010/Imseng_Idiap-RR-15-2010.pdf
PUBLIC
Idiap-RR-15-2010
2010
Idiap
July 2010
Multilingual speech recognition obviously involves numerous research challenges, including common phoneme sets, adaptation on limited amount of training data, as well as mixed language recognition (common in many countries, like Switzerland). In this latter case, it is not even possible to assume that one knows in advance the language being spoken. This is the context and motivation of the present work. We indeed investigate how current state-of-the-art speech recognition systems can be exploited in multilingual environments, where the language (from an assumed set of 5 possible languages, in our case) is not a priori known during recognition. We combine monolingual systems and extensively develop and compare different features and acoustic models. On SpeechDat(II) datasets, and in the context of isolated words, we show that it is actually possible to approach performances of monolingual systems even if the identity of the spoken language is not a priori known.