logo Idiap Research Institute        
 [BibTeX] [Marc21]
Applying multi- and cross-lingual stochastic phone space transformations to non-native speech recognition
Type of publication: Journal paper
Citation: Imseng_TASLP_2013
Publication status: Accepted
Journal: IEEE Transactions on Audio, Speech, and Language Processing
Year: 2013
DOI: 10.1109/TASL.2013.2260150
Abstract: In the context of hybrid HMM/MLP Automatic Speech Recognition (ASR), this paper describes an investigation into a new type of stochastic phone space transformation, which maps “source” phone (or phone HMM state) posterior probabilities (as obtained at the output of a Multilayer Perceptron/MLP) into “destination” phone (HMM phone state) posterior probabilities. The resulting stochastic matrix transformation can be used within the same language to automatically adapt to different phone formats (e.g., IPA) or across languages. Additionally, as shown here, it can also be applied successfully to non-native speech recognition. In the same spirit as MLLR adaptation, or MLP adaptation, the approach proposed here is directly mapping posterior distributions, and is trained by optimizing on a small amount of adaptation data a Kullback–Leibler based cost function, along a modified version of an iterative EM algorithm. On a non-native English database (HIWIRE), and comparing with multiple setups (monophone and triphone mapping, MLLR adaptation) we show that the resulting posterior mapping yields state-of-the-art results using very limited amounts of adaptation data in mono-, cross- and multi-lingual setups. We also show that “universal” phone posteriors, trained on a large amount of multilingual data, can be transformed to English phone posteriors, resulting in an ASR system that significantly outperforms a system trained on English data only. Finally, we demonstrate that the proposed approach outperforms alternative data-driven, as well as a knowledge-based, mapping techniques.
Keywords:
Projects Idiap
SNSF-MULTI
IM2
EMIME
Authors Imseng, David
Bourlard, Hervé
Dines, John
Garner, Philip N.
Magimai.-Doss, Mathew
Added by: [UNK]
Total mark: 0
Attachments
  • Imseng_TASLP_2013.pdf
Notes