CONF
Lecorve_INTERSPEECH-2_2012/IDIAP
Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition
Lecorvé, Gwénolé
Motlicek, Petr
ASR
Automatic Speech Recognition
Language Models
recurrent neural network
speech decoding
weighted finite state transducer
WFST
https://publications.idiap.ch/index.php/publications/showcite/Lecorve_Idiap-RR-21-2012
Related documents
Proceedings of Interspeech
Portland, Oregon, USA
2012
to appear
Recurrent neural network language models (RNNLMs) have recently shown to outperform the venerable n-gram language models (LMs). However, in automatic speech recognition (ASR), RNNLMs were not yet used to directly decode a speech signal. Instead, RNNLMs are rather applied to rescore N-best lists generated from word lattices. To use RNNLMs in earlier stages of the speech recognition, our work proposes to transform RNNLMs into weighted finite state transducers approximating their underlying probability distribution. While the main idea consists in discretizing continuous representations of word histories, we present a first implementation of the approach using clustering techniques and entropy-based pruning. Achieved experimental results on LM perplexity and on ASR word error rates are encouraging since the performance of the discretized RNNLMs is comparable to the one of n-gram LMs.
REPORT
Lecorve_Idiap-RR-21-2012/IDIAP
Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition
Lecorvé, Gwénolé
Motlicek, Petr
ASR
Automatic Speech Recognition
Language Models
recurrent neural network
speech decoding
weighted finite state transducer
WFST
EXTERNAL
https://publications.idiap.ch/attachments/reports/2012/Lecorve_Idiap-RR-21-2012.pdf
PUBLIC
Idiap-RR-21-2012
2012
Idiap
July 2012
Recurrent neural network language models (RNNLMs) have recently shown to outperform the venerable n-gram language models (LMs). However, in automatic speech recognition (ASR), RNNLMs were not yet used to directly decode a speech signal. Instead, RNNLMs are rather applied to rescore N-best lists generated from word lattices. To use RNNLMs in earlier stages of the speech recognition, our work proposes to transform RNNLMs into weighted finite state transducers approximating their underlying probability distribution. While the main idea consists in discretizing continuous representations of word histories, we present a first implementation of the approach using clustering techniques and entropy-based pruning. Achieved experimental results on LM perplexity and on ASR word error rates are encouraging since the performance of the discretized RNNLMs is comparable to the one of n-gram LMs.