CONF
lathoud05a/IDIAP
A Sector-Based, Frequency-Domain Approach to Detection and Localization of Multiple Speakers
Lathoud, Guillaume
Magimai.-Doss, Mathew
EXTERNAL
https://publications.idiap.ch/attachments/papers/2005/lathoud05a.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/lathoud-rr-04-54
Related documents
Proceedings of ICASSP 2005
2005
Philadelphia, USA
March 2005
IDIAP-RR 04-54
Detection and localization of speakers with microphone arrays is a difficult task due to the wideband nature of speech signals, the large amount of overlaps between speakers in spontaneous conversations, and the presence of noise sources. Many existing audio multi-source localization methods rely on prior knowledge of the sectors containing active sources and/or the number of active sources. This paper proposes sector-based, frequency-domain approaches that address both detection and localization problems by measuring relative phases between microphones. The first approach is similar to delay-sum beamforming. The second approach is novel: it relies on systematic optimization of a centroid in phase space, for each sector. It provides major, systematic improvement over the first approach as well as over previous work. Very good results are obtained on more than one hour of recordings in real meeting room conditions, including cases with up to 3 concurrent speakers.
REPORT
lathoud-rr-04-54/IDIAP
A Sector-Based, Frequency-Domain Approach to Detection and Localization of Multiple Speakers
Lathoud, Guillaume
Magimai.-Doss, Mathew
EXTERNAL
https://publications.idiap.ch/attachments/reports/2004/rr-04-54.pdf
PUBLIC
Idiap-RR-54-2004
2004
IDIAP
Martigny, Switzerland
To appear in ``Proceedings of the ICASSP 2005''
Detection and localization of speakers with microphone arrays is a difficult task due to the wideband nature of speech signals, the large amount of overlaps between speakers in spontaneous conversations, and the presence of noise sources. Many existing audio multi-source localization methods rely on prior knowledge of the sectors containing active sources and/or the number of active sources. This paper proposes sector-based, frequency-domain approaches that address both detection and localization problems by measuring relative phases between microphones. The first approach is similar to delay-sum beamforming. The second approach is novel: it relies on systematic optimization of a centroid in phase space, for each sector. It provides major, systematic improvement over the first approach as well as over previous work. Very good results are obtained on more than one hour of recordings in real meeting room conditions, including cases with up to 3 concurrent speakers.