REPORT Motlicek_Idiap-RR-37-2013/IDIAP FEATURE AND SCORE LEVEL COMBINATION OF SUBSPACE GAUSSIANS IN LVCSR TASK Motlicek, Petr Povey, Daniel Karafiat, Martin Automatic Speech Recognition Discriminative features System Combination EXTERNAL http://publications.idiap.ch/attachments/reports/2013/Motlicek_Idiap-RR-37-2013.pdf PUBLIC Idiap-RR-37-2013 2013 Idiap Rue Marconi 19, Martigny, Switzerland November 2013 In this paper, we investigate employment of discriminatively trained acoustic features modeled by Subspace Gaussian Mixture Models (SGMMs) for Rich Transcription meeting recognition. More specifically, first, we focus on exploiting various types of complex features estimated using neural network combined with conventional cepstral features and modeled by standard HMM/GMMs and SGMMs. Then, outputs (word sequences) from individual recognizers trained using different features are also combined on a score-level using ROVER for the both acoustic modeling techniques. Experimental results indicate three important findings: (1) SGMMs consistently outperform HMM/GMMs (relative improvement on average by about 6% in terms of WER) when both techniques are exploited on single features; (2) SGMMs benefit much less from feature-level combination (1% relative improvement) as opposed to HMM/GMMs (4% relative improvement) which can eventually match the performance of SGMMs; (3) SGMMs can be significantly improved when individual systems are combined on a score-level. This suggests that the SGMM systems provide complementary recognition outputs. Overall relative improvements of the combined SGMM and HMM/GMM systems are 21% and 17% respectively compared to a standard ASR baseline.

<subfield code="a">REPORT</subfield>

</datafield>

<subfield code="a">Motlicek_Idiap-RR-37-2013/IDIAP</subfield>

</datafield>

<subfield code="a">FEATURE AND SCORE LEVEL COMBINATION OF SUBSPACE GAUSSIANS IN LVCSR TASK</subfield>

</datafield>

<subfield code="a">Motlicek, Petr</subfield>

</datafield>

<subfield code="a">Povey, Daniel</subfield>

</datafield>

<subfield code="a">Karafiat, Martin</subfield>

</datafield>

<subfield code="a">Automatic Speech Recognition</subfield>

</datafield>

<subfield code="a">Discriminative features</subfield>

</datafield>

<subfield code="a">System Combination</subfield>

</datafield>

<subfield code="i">EXTERNAL</subfield>

<subfield code="u">http://publications.idiap.ch/attachments/reports/2013/Motlicek_Idiap-RR-37-2013.pdf</subfield>

<subfield code="x">PUBLIC</subfield>

</datafield>

<subfield code="a">Idiap-RR-37-2013</subfield>

</datafield>

<subfield code="b">Idiap</subfield>

<subfield code="a">Rue Marconi 19, Martigny, Switzerland</subfield>

</datafield>

<subfield code="d">November 2013</subfield>

</datafield>

<subfield code="a">In this paper, we investigate employment of discriminatively trained acoustic features modeled by Subspace Gaussian Mixture Models (SGMMs) for Rich Transcription meeting recognition. More specifically, first, we focus on exploiting various types of complex features estimated using neural network combined with conventional cepstral features and modeled by standard HMM/GMMs and SGMMs. Then, outputs (word sequences) from individual recognizers trained using different features are also combined on a score-level using ROVER for the both acoustic modeling techniques. Experimental results indicate three important findings: (1) SGMMs consistently outperform HMM/GMMs (relative improvement on average by about 6% in terms of WER) when both techniques are exploited on single features; (2) SGMMs benefit much less from feature-level combination (1% relative improvement) as opposed to HMM/GMMs (4% relative improvement) which can eventually match the performance of SGMMs; (3) SGMMs can be significantly improved when individual systems are combined on a score-level. This suggests that the SGMM systems provide complementary recognition outputs. Overall relative improvements of the combined SGMM and HMM/GMM systems are 21% and 17% respectively compared to a standard ASR baseline.</subfield>

</datafield>

</record>

</collection>