CONF
Imseng_ASRU_2011/IDIAP
Fast and flexible Kullback-Leibler divergence based acoustic modeling for non-native speech recognition
Imseng, David
Rasipuram, Ramya
Magimai-Doss, Mathew
EXTERNAL
https://publications.idiap.ch/attachments/papers/2011/Imseng_ASRU_2011.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/Imseng_Idiap-RR-01-2012
Related documents
Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding
Hawaii, USA
2011
348-353
One of the main challenge in non-native speech recognition is how to handle acoustic variability present in multiaccented non-native speech with limited amount of training data. In this paper, we investigate an approach that addresses this challenge by using Kullback-Leibler divergence based hidden Markov models (KL-HMM). More precisely, the acoustic variability in the multi-accented speech is handled by using multilingual phoneme posterior probabilities, estimated by a multilayer perceptron trained on auxiliary data, as input feature for the KL-HMM system. With limited training data, we then build better acoustic models by exploiting the advantage that the KL-HMM system has fewer number of parameters. On HIWIRE corpus, the proposed approach yields a performance of 1.9% word error rate (WER) with 149 minutes of training data and a performance of 5.5% WER with 2 minutes of training data.
REPORT
Imseng_Idiap-RR-01-2012/IDIAP
Fast and flexible Kullback-Leibler divergence based acoustic modeling for non-native speech recognition
Imseng, David
Rasipuram, Ramya
Magimai-Doss, Mathew
Hidden Markov Model
Kullback-Leibler divergence
multilayer perceptron
Posterior features
EXTERNAL
https://publications.idiap.ch/attachments/reports/2011/Imseng_Idiap-RR-01-2012.pdf
PUBLIC
Idiap-RR-01-2012
2012
Idiap
January 2012
One of the main challenge in non-native speech recognition is how to handle acoustic variability present in multiaccented non-native speech with limited amount of training data. In this paper, we investigate an approach that addresses this challenge by using Kullback-Leibler divergence based hidden Markov models (KL-HMM). More precisely, the acoustic variability in the multi-accented speech is handled by using multilingual phoneme posterior probabilities, estimated by a multilayer perceptron trained on auxiliary data, as input feature for the KL-HMM system. With limited training data, we then build better acoustic models by exploiting the advantage that the KL-HMM system has fewer number of parameters. On HIWIRE corpus, the proposed approach yields a performance of 1.9% word error rate (WER) with 149 minutes of training data and a performance of 5.5% WER with 2 minutes of training data.