CONF Motlicek_INTERSPEECH2013_2013/IDIAP Crosslingual Tandem-SGMM: Exploiting Out-Of-Language Data for Acoustic Model and Feature Level Adaptation Motlicek, Petr Imseng, David Garner, Philip N. EXTERNAL https://publications.idiap.ch/attachments/papers/2013/Motlicek_INTERSPEECH2013_2013.pdf PUBLIC https://publications.idiap.ch/index.php/publications/showcite/Motlicek_Idiap-RR-39-2013 Related documents ISCA - International Speech Communication Association - Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013) Lyon, France 2013 ISCA 510-514 2308-457X Recent studies have shown that speech recognizers may benefit from data in languages other than the target language through efficient acoustic model- or feature-level adaptation. Crosslingual Tandem-Subspace Gaussian Mixture Models (SGMM) are successfully able to combine acoustic model- and feature-level adaptation techniques. More specifically, we focus on under-resourced languages (Afrikaans in our case) and perform feature-level adaptation through the estimation of phone class posterior features with a Multilayer Perceptron that was trained on data from a similar language with large amounts of available speech data (Dutch in our case). The same Dutch data can also be exploited on an acoustic model-level by training globally-shared SGMM parameters in a crosslingual way. The two adaptation techniques are indeed complementary and result in a crosslingual Tandem-SGMM system that yields relative improvement of about 22% compared to a standard speech recognizer on an Afrikaans phoneme recognition task. Interestingly, eventual score-level combination of the individual SGMM systems yields additional 3% relative improvement. REPORT Motlicek_Idiap-RR-39-2013/IDIAP Crosslingual Tandem-SGMM: Exploiting Out-Of-Language Data for Acoustic Model and Feature Level Adaptation Motlicek, Petr Imseng, David Garner, Philip N. Acoustic model adaptation Automatic Speech Recognition under-resourced languages EXTERNAL https://publications.idiap.ch/attachments/reports/2013/Motlicek_Idiap-RR-39-2013.pdf PUBLIC Idiap-RR-39-2013 2013 Idiap Rue Marconi 19, Martigny, Switzerland November 2013 Recent studies have shown that speech recognizers may benefit from data in languages other than the target language through efficient acoustic model- or feature-level adaptation. Crosslingual Tandem-Subspace Gaussian Mixture Models (SGMM) are successfully able to combine acoustic model- and feature-level adaptation techniques. More specifically, we focus on under-resourced languages (Afrikaans in our case) and perform feature-level adaptation through the estimation of phone class posterior features with a Multilayer Perceptron that was trained on data from a similar language with large amounts of available speech data (Dutch in our case). The same Dutch data can also be exploited on an acoustic model-level by training globally-shared SGMM parameters in a crosslingual way. The two adaptation techniques are indeed complementary and result in a crosslingual Tandem-SGMM system that yields relative improvement of about 22% compared to a standard speech recognizer on an Afrikaans phoneme recognition task. Interestingly, eventual score-level combination of the individual SGMM systems yields additional 3% relative improvement.