CONF
icslp2002/IDIAP
Low cost duration modelling for noise robust speech recognition
Morris, Andrew
Payne, Simon
Bourlard, Hervé
duration models
HMMs
noise robust ASR
EXTERNAL
https://publications.idiap.ch/attachments/reports/2002/morris-2002-icslp.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/morris-rr-02-08
Related documents
Proc. ICSLP
2002
Denver, Colorado, USA
State transition matrices as used in standard HMM decoders have two widely perceived limitations. One is that the implicit Geometric state duration distributions which they model do not accurately reflect true duration distributions. The other is that they impose no hard limit on maximum duration with the result that state transition probabilities often have little influence when combined with acoustic probabilities, which are of a different order of magnitude. Explicit duration models were developed in the past to address the first problem. These were not widely taken up because their performance advantage in clean speech recognition was often not sufficiently great to offset the extra complexity which they introduced. However, duration models have much greater potential when applied to noisy speech recognition. In this paper we present a simple and generic form of explicit duration model and show that this leads to strong performance improvements when applied to connected digit recognition in noise.
REPORT
morris-RR-02-08/IDIAP
Low cost duration modelling for noise robust speech recognition
Morris, Andrew
Payne, Simon
Bourlard, Hervé
duration models
HMMs
noise robust ASR
EXTERNAL
https://publications.idiap.ch/attachments/reports/2002/rr02-08.pdf
PUBLIC
Idiap-RR-08-2002
2002
IDIAP
State transition matrices as used in standard HMM decoders have two widely perceived limitations. One is that the implicit Geometric state duration distributions which they model do not accurately reflect true duration distributions. The other is that they impose no hard limit on maximum duration with the result that state transition probabilities often have little influence when combined with acoustic probabilities, which are of a different order of magnitude. Explicit duration models were developed in the past to address the first problem. These were not widely taken up because their performance advantage in clean speech recognition was often not sufficiently great to offset the extra complexity which they introduced. However, duration models have much greater potential when applied to noisy speech recognition. In this paper we present a simple and generic form of explicit duration model and show that this leads to strong performance improvements when applied to connected digit recognition in noise.