CONF Imseng_INTERSPEECH_2010/IDIAP Hierarchical Multilayer Perceptron based Language Identification Imseng, David Magimai.-Doss, Mathew Bourlard, Hervé EXTERNAL https://publications.idiap.ch/attachments/papers/2010/Imseng_INTERSPEECH_2010.pdf PUBLIC https://publications.idiap.ch/index.php/publications/showcite/Imseng_Idiap-RR-14-2010 Related documents Proceedings of Interspeech Makuhari, Japan 2010 September 2010 2722-2725 Automatic language identification (LID) systems generally exploit acoustic knowledge, possibly enriched by explicit language specific phonotactic or lexical constraints. This paper investigates a new LID approach based on hierarchical multilayer perceptron (MLP) classifiers, where the first layer is a ``universal phoneme set MLP classifier''. The resulting (multilingual) phoneme posterior sequence is fed into a second MLP taking a larger temporal context into account. The second MLP can learn/exploit implicitly different types of patterns/information such as confusion between phonemes and/or phonotactics for LID. We investigate the viability of the proposed approach by comparing it against two standard approaches which use phonotactic and lexical constraints with the universal phoneme set MLP classifier as emission probability estimator. On SpeechDat(II) datasets of five European languages, the proposed approach yields significantly better performance compared to the two standard approaches. REPORT Imseng_Idiap-RR-14-2010/IDIAP Hierarchical Multilayer Perceptron based Language Identification Imseng, David Magimai.-Doss, Mathew Bourlard, Hervé EXTERNAL https://publications.idiap.ch/attachments/reports/2010/Imseng_Idiap-RR-14-2010.pdf PUBLIC Idiap-RR-14-2010 2010 Idiap July 2010 Automatic language identification (LID) systems generally exploit acoustic knowledge, possibly enriched by explicit language specific phonotactic or lexical constraints. This paper investigates a new LID approach based on hierarchical multilayer perceptron (MLP) classifiers, where the first layer is a "universal phoneme set MLP classifier''. The resulting (multilingual) phoneme posterior sequence is fed into a second MLP taking a larger temporal context into account. The second MLP can learn/exploit implicitly different types of patterns/information such as confusion between phonemes and/or phonotactics for LID. We investigate the viability of the proposed approach by comparing it against two standard approaches which use phonotactic and lexical constraints with the universal phoneme set MLP classifier as emission probability estimator. On SpeechDat(II) datasets of five European languages, the proposed approach yields significantly better performance compared to the two standard approaches.