CONF
hari-rr-05-03b/IDIAP
Speech Acquisition in Meetings with an Audio-Visual Sensor Array
McCowan, Iain A.
Krishna, Maganti Hari
Gatica-Perez, Daniel
Moore, Darren
Ba, Silèye O.
EXTERNAL
https://publications.idiap.ch/attachments/reports/2005/hari-icme05.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/hari-rr-05-03
Related documents
Pro. IEEE ICME
2005
IDIAP-RR 05-03
Close-talk headset microphones have been traditionally used for speech acquisition in a number of applications, as they naturally provide a higher signal-to-noise ratio -needed for recognition tasks- than single distant microphones. However, in multi-party conversational settings like meetings, microphone arrays represent an important alternative to close-talking microphones, as they allow for localisation and tracking of speakers and signal-independent enhancement, while providing a non-intrusive, hands-free operation mode. In this article, we investigate the joint use of a small table-top microphone array and a camera array for speaker tracking and speech enhancement in meetings. Our methodology first fuses audio and video for person tracking, and then integrates the output of the tracker with a beamformer for speech enhancement. We compare and discuss the features of the resulting speech signal with respect to that obtained from single close-talking and table-top microphones.
REPORT
hari-rr-05-03/IDIAP
Speech Acquisition in Meetings with an Audio-Visual Sensor Array
McCowan, Iain A.
Krishna, Maganti Hari
Gatica-Perez, Daniel
Moore, Darren
Ba, Silèye O.
EXTERNAL
https://publications.idiap.ch/attachments/reports/2005/rr-05-03.pdf
PUBLIC
Idiap-RR-03-2005
2005
IDIAP
Martigny, Switzerland
Published in ``Prof. IEEE ICME'', July, 2005
Close-talk headset microphones have been traditionally used for speech acquisition in a number of applications, as they naturally provide a higher signal-to-noise ratio -needed for recognition tasks- than single distant microphones. However, in multi-party conversational settings like meetings, microphone arrays represent an important alternative to close-talking microphones, as they allow for localisation and tracking of speakers and signal-independent enhancement, while providing a non-intrusive, hands-free operation mode. In this article, we investigate the joint use of a small table-top microphone array and a camera array for speaker tracking and speech enhancement in meetings. Our methodology first fuses audio and video for person tracking, and then integrates the output of the tracker with a beamformer for speech enhancement. We compare and discuss the features of the resulting speech signal with respect to that obtained from single close-talking and table-top microphones.