CHAPTER
Bourlard_SPRINGERMA_2008/IDIAP
How does a dictation machine recognize speech ?
Dutoit, T.
Couvreur, L.
Bourlard, Hervé
EXTERNAL
https://publications.idiap.ch/attachments/papers/2008/Bourlard_SPRINGERMA_2008.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/Bourlard_Idiap-RR-72-2008
Related documents
Applied Signal Processing--A MATLAB approach
2008
Springer MA
104-148
REPORT
Bourlard_Idiap-RR-72-2008/IDIAP
How does a dictation machine recognize speech?
Dutoit, T.
Couvreur, L.
Bourlard, Hervé
EXTERNAL
https://publications.idiap.ch/attachments/reports/2008/Bourlard_Idiap-RR-72-2008.pdf
PUBLIC
Idiap-RR-72-2008
2008
Idiap
Centre du Parc, Rue Marconi 19, 1920 Martigny
November 2008
There is magic (or is it witchcraft?) in a speech recognizer that transcribes continuous radio speech into text with a word accuracy of even not more than 50%. The extreme difficulty of this task, tough, is usually not perceived by the general public. This is because we are almost deaf to the infinite acoustic variations that accompany the production of vocal sounds, which arise from physiological constraints (co-articulation,',','),
but also from the acoustic environment (additive or convolutional noise, Lombard effect,',','),
or from the emotional state of the speaker (voice quality, speaking rate, hesitations, etc.)46. Our consciousness of speech is indeed not stimulated until after it has been processed by our brain to make it appear as a sequence of meaningful units: phonemes and words. In this Chapter we will see how statistical pattern recognition and statistical sequence recognition techniques are currently used for trying to mimic this extraordinary faculty of our mind (4.1). We will follow, in Section 4.2, with a MATLAB-based proof of concept of word-based automatic speech recognition (ASR) based on Hidden Markov Models (HMM,',','),
using a bigram model for modeling (syntactic-semantic) language constraints.