CONF
astrid-01-10a/IDIAP
Error Correcting Posterior Combination for Robust Multi-Band Speech Recognition
Hagen, Astrid
Bourlard, Hervé
AFC
error correction
FC
full combination
HMM/ANN-Hybrid
multi-band
weighting
EXTERNAL
https://publications.idiap.ch/attachments/reports/2001/rr01-10.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/astrid-01-10
Related documents
EUROSPEECH
2001
257-260
In human perception, the availability of context enhances recognition and renders it more robust to noise. Even if not all phonemes in a word (or words in a sentence etc.) are correctly perceived, humans can fill in missing parts with the help of cues from the surrounding speech parts. This was proven in studies on human speech perception where recognition of words in sentences under noise was shown to outperform recognition of words in isolation or, even more drastically, of nonsense syllables under noise. A new model for quantifying the influence of contextual information on human recognition performance was recently proposed. Although the authors state that it is not a model for the recognition process itself, we will see how the ideas behind this model can be used in automatic speech recognition to extend our formerly introduced multi-band recognition systems to incorporate frequency contextual information. We will compare the new set-up to our former models such as the full combination subband approach and its approximation.
REPORT
astrid-01-10/IDIAP
Error Correcting Posterior Combination for Robust Multi-Band Speech Recognition
Hagen, Astrid
Bourlard, Hervé
AFC
error correction
FC
full combination
HMM/ANN-Hybrid
multi-band
weighting
EXTERNAL
https://publications.idiap.ch/attachments/reports/2001/rr01-10.pdf
PUBLIC
Idiap-RR-10-2001
2001
IDIAP
Martigny, Switzerland
March 2001
In human perception, the availability of context enhances recognition and renders it more robust to noise. Even if not all phonemes in a word (or words in a sentence etc.) are correctly perceived, humans can fill in missing parts with the help of cues from the surrounding speech parts. This was proven in studies on human speech perception where recognition of words in sentences under noise was shown to outperform recognition of words in isolation or, even more drastically, of nonsense syllables under noise. A new model for quantifying the influence of contextual information on human recognition performance was recently proposed. Although the authors state that it is not a model for the recognition process itself, we will see how the ideas behind this model can be used in automatic speech recognition to extend our formerly introduced multi-band recognition systems to incorporate frequency contextual information. We will compare the new set-up to our former models such as the full combination subband approach and its approximation.