CONF Oualil_ICASSP2013_2013/IDIAP A Probabilistic Framework for Multiple Speaker Localization Oualil, Youssef Magimai-Doss, Mathew Faubel, Friedrich Klakow, Dietrich http://publications.idiap.ch/index.php/publications/showcite/Oualil_Idiap-RR-37-2012 Related documents Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2013 This paper presents a novel probabilistic framework for localizing multiple speakers with a microphone array. In this framework, the generalized cross correlation function (GCC) of each microphone pair is interpreted as a probability distribution of the time difference of arrival (TDOA) and subsequently approximated as a Gaussian mixture. The distribution parameters are estimated with a weighted expectation maximization algorithm. Then, the joint distribution of the TDOA Gaussian mixtures is mapped to a multimodal distribution in the location space, where each mode represents a potential source location. The approach taken here performs the localization by 1) reducing the search space to some regions that are likely to contain a source and then 2) extracting the actual speaker locations with a numerical optimization algorithm. The effectiveness of the proposed approach is shown using the AV16.3 corpus. REPORT Oualil_Idiap-RR-37-2012/IDIAP A Probabilistic Framework for Multiple Speaker Localization Oualil, Youssef Magimai-Doss, Mathew Faubel, Friedrich Klakow, Dietrich Oualil, Youssef Ed. Magimai-Doss, Mathew Ed. Gaussian mixture localization microphone arrays multiple speakers Steered response power EXTERNAL http://publications.idiap.ch/attachments/reports/2012/Oualil_Idiap-RR-37-2012.pdf PUBLIC Idiap-RR-37-2012 2012 Idiap December 2012 Submitted to ICASSP'13 This paper presents a novel probabilistic framework for localizing multiple speakers with a microphone array. In this framework, the generalized cross correlation function (GCC) of each microphone pair is interpreted as a probability distribution of the time difference of arrival (TDOA) and subsequently approximated as a Gaussian mixture. The distribution parameters are estimated with a weighted expectation maximization algorithm. Then, the joint distribution of the TDOA Gaussian mixtures is mapped to a multimodal distribution in the location space, where each mode represents a potential source location. The approach taken here performs the localization by 1) reducing the search space to some regions that are likely to contain a source and then 2) extracting the actual speaker locations with a numerical optimization algorithm. The effectiveness of the proposed approach is shown using the AV16.3 corpus.