CONF Lecorve_INTERSPEECH-2_2012/IDIAP Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition Lecorvé, Gwénolé Motlicek, Petr ASR Automatic Speech Recognition Language Models recurrent neural network speech decoding weighted finite state transducer WFST http://publications.idiap.ch/index.php/publications/showcite/Lecorve_Idiap-RR-21-2012 Related documents Proceedings of Interspeech Portland, Oregon, USA 2012 to appear Recurrent neural network language models (RNNLMs) have recently shown to outperform the venerable n-gram language models (LMs). However, in automatic speech recognition (ASR), RNNLMs were not yet used to directly decode a speech signal. Instead, RNNLMs are rather applied to rescore N-best lists generated from word lattices. To use RNNLMs in earlier stages of the speech recognition, our work proposes to transform RNNLMs into weighted finite state transducers approximating their underlying probability distribution. While the main idea consists in discretizing continuous representations of word histories, we present a first implementation of the approach using clustering techniques and entropy-based pruning. Achieved experimental results on LM perplexity and on ASR word error rates are encouraging since the performance of the discretized RNNLMs is comparable to the one of n-gram LMs. REPORT Lecorve_Idiap-RR-21-2012/IDIAP Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition Lecorvé, Gwénolé Motlicek, Petr ASR Automatic Speech Recognition Language Models recurrent neural network speech decoding weighted finite state transducer WFST EXTERNAL http://publications.idiap.ch/attachments/reports/2012/Lecorve_Idiap-RR-21-2012.pdf PUBLIC Idiap-RR-21-2012 2012 Idiap July 2012 Recurrent neural network language models (RNNLMs) have recently shown to outperform the venerable n-gram language models (LMs). However, in automatic speech recognition (ASR), RNNLMs were not yet used to directly decode a speech signal. Instead, RNNLMs are rather applied to rescore N-best lists generated from word lattices. To use RNNLMs in earlier stages of the speech recognition, our work proposes to transform RNNLMs into weighted finite state transducers approximating their underlying probability distribution. While the main idea consists in discretizing continuous representations of word histories, we present a first implementation of the approach using clustering techniques and entropy-based pruning. Achieved experimental results on LM perplexity and on ASR word error rates are encouraging since the performance of the discretized RNNLMs is comparable to the one of n-gram LMs.

</datafield>

<subfield code="a">Lecorve_INTERSPEECH-2_2012/IDIAP</subfield>

</datafield>

<subfield code="a">Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition</subfield>

</datafield>

<subfield code="a">Lecorvé, Gwénolé</subfield>

</datafield>

<subfield code="a">Motlicek, Petr</subfield>

</datafield>

</datafield>

<subfield code="a">Automatic Speech Recognition</subfield>

</datafield>

<subfield code="a">Language Models</subfield>

</datafield>

<subfield code="a">recurrent neural network</subfield>

</datafield>

<subfield code="a">speech decoding</subfield>

</datafield>

<subfield code="a">weighted finite state transducer</subfield>

</datafield>

</datafield>

<subfield code="u">http://publications.idiap.ch/index.php/publications/showcite/Lecorve_Idiap-RR-21-2012</subfield>

<subfield code="z">Related documents</subfield>

</datafield>

<subfield code="a">Proceedings of Interspeech</subfield>

<subfield code="c">Portland, Oregon, USA</subfield>

</datafield>

</datafield>

<subfield code="c">to appear</subfield>

</datafield>

<subfield code="a">Recurrent neural network language models (RNNLMs) have recently shown to outperform the venerable n-gram language models (LMs). However, in automatic speech recognition (ASR), RNNLMs were not yet used to directly decode a speech signal. Instead, RNNLMs are rather applied to rescore N-best lists generated from word lattices. To use RNNLMs in earlier stages of the speech recognition, our work proposes to transform RNNLMs into weighted finite state transducers approximating their underlying probability distribution. While the main idea consists in discretizing continuous representations of word histories, we present a first implementation of the approach using clustering techniques and entropy-based pruning. Achieved experimental results on LM perplexity and on ASR word error rates are encouraging since the performance of the discretized RNNLMs is comparable to the one of n-gram LMs.</subfield>

</datafield>

</record>

<subfield code="a">REPORT</subfield>

</datafield>

<subfield code="a">Lecorve_Idiap-RR-21-2012/IDIAP</subfield>

</datafield>

<subfield code="a">Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition</subfield>

</datafield>

<subfield code="a">Lecorvé, Gwénolé</subfield>

</datafield>

<subfield code="a">Motlicek, Petr</subfield>

</datafield>

</datafield>

<subfield code="a">Automatic Speech Recognition</subfield>

</datafield>

<subfield code="a">Language Models</subfield>

</datafield>

<subfield code="a">recurrent neural network</subfield>

</datafield>

<subfield code="a">speech decoding</subfield>

</datafield>

<subfield code="a">weighted finite state transducer</subfield>

</datafield>

</datafield>

<subfield code="i">EXTERNAL</subfield>

<subfield code="u">http://publications.idiap.ch/attachments/reports/2012/Lecorve_Idiap-RR-21-2012.pdf</subfield>

<subfield code="x">PUBLIC</subfield>

</datafield>

<subfield code="a">Idiap-RR-21-2012</subfield>

</datafield>

<subfield code="b">Idiap</subfield>

</datafield>

</datafield>

</datafield>

</record>

</collection>