CONF
Motlicek_ICASSP2013_2013/IDIAP
FEATURE AND SCORE LEVEL COMBINATION OF SUBSPACE GAUSSIANS IN LVCSR TASK
Motlicek, Petr
Povey, Daniel
Karafiat, Martin
EXTERNAL
https://publications.idiap.ch/attachments/papers/2013/Motlicek_ICASSP2013_2013.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/Motlicek_Idiap-RR-37-2013
Related documents
IEEE - The 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Vancouver, BC, Canada
2013
7604-7608
1520-6149
10.1109/ICASSP.2013.6639142
doi
In this paper, we investigate employment of discriminatively trained acoustic features modeled by Subspace Gaussian Mixture Models (SGMMs) for Rich Transcription meeting recognition. More specifically, first, we focus on exploiting various types of complex features estimated using neural network combined with conventional cepstral features and modeled by standard HMM/GMMs and SGMMs. Then, outputs (word sequences) from individual recognizers trained using
different features are also combined on a score-level using ROVER for the both acoustic modeling techniques. Experimental results indicate three important findings: (1) SGMMs consistently outperform HMM/GMMs (relative improvement on average by about 6% in terms of WER) when both techniques are exploited on single features; (2) SGMMs benefit much less from feature-level combination (1% relative improvement) as opposed to HMM/GMMs (4% relative improvement) which can eventually match the performance of SGMMs; (3) SGMMs can be significantly improved when individual systems are combined on a score-level. This suggests that the SGMM systems provide complementary recognition outputs. Overall relative improvements of the combined SGMM and HMM/GMM systems are 21% and 17% respectively compared to a standard ASR baseline.
REPORT
Motlicek_Idiap-RR-37-2013/IDIAP
FEATURE AND SCORE LEVEL COMBINATION OF SUBSPACE GAUSSIANS IN LVCSR TASK
Motlicek, Petr
Povey, Daniel
Karafiat, Martin
Automatic Speech Recognition
Discriminative features
System Combination
EXTERNAL
https://publications.idiap.ch/attachments/reports/2013/Motlicek_Idiap-RR-37-2013.pdf
PUBLIC
Idiap-RR-37-2013
2013
Idiap
Rue Marconi 19, Martigny, Switzerland
November 2013
In this paper, we investigate employment of discriminatively trained acoustic features modeled by Subspace Gaussian Mixture Models (SGMMs) for Rich Transcription meeting recognition. More specifically, first, we focus on exploiting various types of complex features estimated using neural network combined with conventional cepstral features and modeled by standard HMM/GMMs and SGMMs. Then, outputs (word sequences) from individual recognizers trained using different features are also combined on a score-level using ROVER for the both acoustic modeling techniques. Experimental results indicate three important findings: (1) SGMMs consistently outperform HMM/GMMs (relative improvement on average by about 6% in terms of WER) when both techniques are exploited on single features; (2) SGMMs benefit much less from feature-level combination (1% relative improvement) as opposed to HMM/GMMs (4% relative improvement) which can eventually match the performance of SGMMs; (3) SGMMs can be significantly improved when individual systems are combined on a score-level. This suggests that the SGMM systems provide complementary recognition outputs. Overall relative improvements of the combined SGMM and HMM/GMM systems are 21% and 17% respectively compared to a standard ASR baseline.