How does a dictation machine recognize speech?

Type of publication:	Idiap-RR
Citation:	Bourlard_Idiap-RR-72-2008
Number:	Idiap-RR-72-2008
Year:	2008
Month:	11
Institution:	Idiap
Address:	Centre du Parc, Rue Marconi 19, 1920 Martigny
Abstract:	There is magic (or is it witchcraft?) in a speech recognizer that transcribes continuous radio speech into text with a word accuracy of even not more than 50%. The extreme difficulty of this task, tough, is usually not perceived by the general public. This is because we are almost deaf to the infinite acoustic variations that accompany the production of vocal sounds, which arise from physiological constraints (co-articulation,',','), but also from the acoustic environment (additive or convolutional noise, Lombard effect,',','), or from the emotional state of the speaker (voice quality, speaking rate, hesitations, etc.)46. Our consciousness of speech is indeed not stimulated until after it has been processed by our brain to make it appear as a sequence of meaningful units: phonemes and words. In this Chapter we will see how statistical pattern recognition and statistical sequence recognition techniques are currently used for trying to mimic this extraordinary faculty of our mind (4.1). We will follow, in Section 4.2, with a MATLAB-based proof of concept of word-based automatic speech recognition (ASR) based on Hidden Markov Models (HMM,',','), using a bigram model for modeling (syntactic-semantic) language constraints.
Keywords:
Projects:	Idiap
Authors:	Dutoit, T. Couvreur, L. Bourlard, Hervé
Crossref by	Bourlard_SPRINGERMA_2008
Added by:	[ADM]
Total mark:	0
Attachments
Bourlard_Idiap-RR-72-2008.pdf (MD5: f665b9277fea67499e02a1ccfd6bd9f9)
Notes

processing time: 0.0008 seconds.