Integrating Language Identification to improve Multilingual Speech Recognition

Type of publication:	Idiap-RR
Citation:	Caesar_Idiap-RR-24-2012
Number:	Idiap-RR-24-2012
Year:	2012
Month:	7
Institution:	Idiap
Abstract:	The process of determining the language of a speech utterance is called Language Identification (LID). This task can be very challenging as it has to take into account various language-specific aspects, such as phonetic, phonotactic, vocabulary and grammar-related cues. In multilingual speech recognition we try to find the most likely word sequence that corresponds to an utterance where the language is not known a priori. This is a considerably harder task compared to monolingual speech recognition and it is common to use LID to estimate the current language. In this project we present two general approaches for LID and describe how to integrate them into multilingual speech recognizers. The first approach uses hierarchical multilayer perceptrons to estimate language posterior probabilities given the acoustics in combination with hidden Markov models. The second approach evaluates the output of a multilingual speech recognizer to determine the spoken language. The research is applied to the MediaParl speech corpus that was recorded at the Parliament of the canton of Valais, where people switch from Swiss French to Swiss German or vice versa. Our experiments show that, on that particular data set, LID can be used to significantly improve the performance of multilingual speech recognizers. We will also point out that ASR dependent LID approaches yield the best performance due to higher-level cues and that our systems perform much worse on non-native data
Keywords:	code-switched speech data, language identification, multilingual speech recognition, neural network features
Projects:	Idiap
Authors:	Caesar, Holger
Added by:	[ADM]
Total mark:	0
Attachments
Caesar_Idiap-RR-24-2012.pdf (MD5: 9fbce83b6d749049cdc51e4e095bd434)
Notes

processing time: 0.0002 seconds.