CONF
Oualil_ICASSP2013_2013/IDIAP
A Probabilistic Framework for Multiple Speaker Localization
Oualil, Youssef
Magimai-Doss, Mathew
Faubel, Friedrich
Klakow, Dietrich
https://publications.idiap.ch/index.php/publications/showcite/Oualil_Idiap-RR-37-2012
Related documents
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
2013
This paper presents a novel probabilistic framework for localizing multiple
speakers with a microphone array. In this framework, the generalized cross
correlation function (GCC) of each microphone pair is interpreted as a probability
distribution of the time difference of arrival (TDOA) and subsequently approximated as
a Gaussian mixture. The distribution parameters are estimated with a weighted expectation maximization algorithm. Then, the joint distribution of the TDOA Gaussian
mixtures is mapped to a multimodal distribution in the location space, where
each mode represents a potential source location. The approach taken here
performs the localization by 1) reducing the search space to some regions
that are likely to contain a source and then 2) extracting the actual
speaker locations with a numerical optimization algorithm. The effectiveness
of the proposed approach is shown using the AV16.3 corpus.
REPORT
Oualil_Idiap-RR-37-2012/IDIAP
A Probabilistic Framework for Multiple Speaker Localization
Oualil, Youssef
Magimai-Doss, Mathew
Faubel, Friedrich
Klakow, Dietrich
Oualil, Youssef
Ed.
Magimai-Doss, Mathew
Ed.
Gaussian mixture
localization
microphone arrays
multiple speakers
Steered response power
EXTERNAL
https://publications.idiap.ch/attachments/reports/2012/Oualil_Idiap-RR-37-2012.pdf
PUBLIC
Idiap-RR-37-2012
2012
Idiap
December 2012
Submitted to ICASSP'13
This paper presents a novel probabilistic framework for localizing multiple
speakers with a microphone array. In this framework, the generalized cross
correlation function (GCC) of each microphone pair is interpreted as a probability
distribution of the time difference of arrival (TDOA) and subsequently approximated as
a Gaussian mixture. The distribution parameters are estimated with a weighted expectation
maximization algorithm. Then, the joint distribution of the TDOA Gaussian
mixtures is mapped to a multimodal distribution in the location space, where
each mode represents a potential source location. The approach taken here
performs the localization by 1) reducing the search space to some regions
that are likely to contain a source and then 2) extracting the actual
speaker locations with a numerical optimization algorithm. The effectiveness
of the proposed approach is shown using the AV16.3 corpus.