CONF mccowan-rr-02-09-proc/IDIAP Robust Speech Recognition with Small Microphone Arrays using the Missing Data Approach McCowan, Iain A. Morris, Andrew Bourlard, Hervé EXTERNAL https://publications.idiap.ch/attachments/reports/2002/rr02-09.pdf PUBLIC https://publications.idiap.ch/index.php/publications/showcite/mccowan-rr-02-09 Related documents Proceedings of International Conference on Speech and Language Processing (ICSLP) 2002 Martigny, Switzerland 2181-2184 IDIAP-RR 02-09 Traditional microphone array speech recognition systems simply recognise the enhanced output of the array. As the level of signal enhancement depends on the number of microphones, such systems do not achieve acceptable speech recognition performance for arrays having only a few microphones. For small microphone arrays, we instead propose using the enhanced output to estimate a reliability mask, which is then used in missing data speech recognition. In missing data speech recognition, the decoded sequence depends on the reliability of each input feature. This reliability is usually based on the signal to noise ratio in each frequency band. In this paper, we use the energy difference between the noisy input and the enhanced output of a small microphone array to determine the frequency band reliability. Recognition experiments with a small array demonstrate the effectiveness of the technique, compared to both traditional microphone array enhancement and a baseline missing data system. REPORT mccowan-rr-02-09/IDIAP Robust Speech Recognition with Small Microphone Arrays using the Missing Data Approach McCowan, Iain A. Morris, Andrew Bourlard, Hervé EXTERNAL https://publications.idiap.ch/attachments/reports/2002/rr02-09.pdf PUBLIC Idiap-RR-09-2002 2002 IDIAP Martigny, Switzerland Published in Proceedings of ICLSP Traditional microphone array speech recognition systems simply recognise the enhanced output of the array. As the level of signal enhancement depends on the number of microphones, such systems do not achieve acceptable speech recognition performance for arrays having only a few microphones. For small microphone arrays, we instead propose using the enhanced output to estimate a reliability mask, which is then used in missing data speech recognition. In missing data speech recognition, the decoded sequence depends on the reliability of each input feature. This reliability is usually based on the signal to noise ratio in each frequency band. In this paper, we use the energy difference between the noisy input and the enhanced output of a small microphone array to determine the frequency band reliability. Recognition experiments with a small array demonstrate the effectiveness of the technique, compared to both traditional microphone array enhancement and a baseline missing data system.