CONF gatica05c-conf/IDIAP Multimodal Multispeaker Probabilistic Tracking in Meetings Gatica-Perez, Daniel Lathoud, Guillaume Odobez, Jean-Marc McCowan, Iain A. EXTERNAL http://publications.idiap.ch/attachments/reports/2004/rr-04-66.pdf PUBLIC http://publications.idiap.ch/index.php/publications/showcite/gatica05c Related documents Proc. Int. Conf. on Multimodal Interfaces (ICMI) 2005 Tracking speakers in multiparty conversations constitutes a fundamental task for automatic meeting analysis. In this paper, we present a probabilistic approach to jointly track the location and speaking activity of multiple speakers in a multisensor meeting room, equipped with a small microphone array and multiple uncalibrated cameras. Our framework is based on a mixed-state dynamic graphical model defined on a multiperson state-space, which includes the explicit definition of a proximity-based interaction model. The model integrates audio-visual (AV) data through a novel observation model. Audio observations are derived from a source localization algorithm. Visual observations are based on models of the shape and spatial structure of human heads. Approximate inference in our model, needed given its complexity, is performed with a Markov Chain Monte Carlo particle filter (MCMC-PF,',','), which results in high sampling efficiency. We present results -based on an objective evaluation procedure- that show that our framework (1) is capable of locating and tracking the position and speaking activity of multiple meeting participants engaged in real conversations with good accuracy; (2) can deal with cases of visual clutter and partial occlusion; and (3) significantly outperforms a traditional sampling-based approach. REPORT gatica05c/IDIAP Multimodal Multispeaker Probabilistic Tracking in Meetings Gatica-Perez, Daniel Lathoud, Guillaume Odobez, Jean-Marc McCowan, Iain A. EXTERNAL http://publications.idiap.ch/attachments/reports/2004/rr-04-66.pdf PUBLIC Idiap-RR-66-2004 2004 IDIAP Martigny, Switzerland in Proc. Int. Conf. on Multimodal Interfaces (ICMI,',','), Trento, Oct. 2005. Tracking speakers in multiparty conversations constitutes a fundamental task for automatic meeting analysis. In this paper, we present a probabilistic approach to jointly track the location and speaking activity of multiple speakers in a multisensor meeting room, equipped with a small microphone array and multiple uncalibrated cameras. Our framework is based on a mixed-state dynamic graphical model defined on a multiperson state-space, which includes the explicit definition of a proximity-based interaction model. The model integrates audio-visual (AV) data through a novel observation model. Audio observations are derived from a source localization algorithm. Visual observations are based on models of the shape and spatial structure of human heads. Approximate inference in our model, needed given its complexity, is performed with a Markov Chain Monte Carlo particle filter (MCMC-PF,',','), which results in high sampling efficiency. We present results -based on an objective evaluation procedure- that show that our framework (1) is capable of locating and tracking the position and speaking activity of multiple meeting participants engaged in real conversations with good accuracy; (2) can deal with cases of visual clutter and partial occlusion; and (3) significantly outperforms a traditional sampling-based approach.

</datafield>

<subfield code="a">gatica05c-conf/IDIAP</subfield>

</datafield>

<subfield code="a">Multimodal Multispeaker Probabilistic Tracking in Meetings</subfield>

</datafield>

<subfield code="a">Gatica-Perez, Daniel</subfield>

</datafield>

<subfield code="a">Lathoud, Guillaume</subfield>

</datafield>

<subfield code="a">Odobez, Jean-Marc</subfield>

</datafield>

<subfield code="a">McCowan, Iain A.</subfield>

</datafield>

<subfield code="i">EXTERNAL</subfield>

<subfield code="u">http://publications.idiap.ch/attachments/reports/2004/rr-04-66.pdf</subfield>

<subfield code="x">PUBLIC</subfield>

</datafield>

<subfield code="u">http://publications.idiap.ch/index.php/publications/showcite/gatica05c</subfield>

<subfield code="z">Related documents</subfield>

</datafield>

<subfield code="a">Proc. Int. Conf. on Multimodal Interfaces (ICMI)</subfield>

</datafield>

</datafield>

<subfield code="a">Tracking speakers in multiparty conversations constitutes a fundamental task for automatic meeting analysis. In this paper, we present a probabilistic approach to jointly track the location and speaking activity of multiple speakers in a multisensor meeting room, equipped with a small microphone array and multiple uncalibrated cameras. Our framework is based on a mixed-state dynamic graphical model defined on a multiperson state-space, which includes the explicit definition of a proximity-based interaction model. The model integrates audio-visual (AV) data through a novel observation model. Audio observations are derived from a source localization algorithm. Visual observations are based on models of the shape and spatial structure of human heads. Approximate inference in our model, needed given its complexity, is performed with a Markov Chain Monte Carlo particle filter (MCMC-PF,',','), which results in high sampling efficiency. We present results -based on an objective evaluation procedure- that show that our framework (1) is capable of locating and tracking the position and speaking activity of multiple meeting participants engaged in real conversations with good accuracy; (2) can deal with cases of visual clutter and partial occlusion; and (3) significantly outperforms a traditional sampling-based approach.</subfield>

</datafield>

</record>

<subfield code="a">REPORT</subfield>

</datafield>

<subfield code="a">gatica05c/IDIAP</subfield>

</datafield>

<subfield code="a">Multimodal Multispeaker Probabilistic Tracking in Meetings</subfield>

</datafield>

<subfield code="a">Gatica-Perez, Daniel</subfield>

</datafield>

<subfield code="a">Lathoud, Guillaume</subfield>

</datafield>

<subfield code="a">Odobez, Jean-Marc</subfield>

</datafield>

<subfield code="a">McCowan, Iain A.</subfield>

</datafield>

<subfield code="i">EXTERNAL</subfield>

<subfield code="u">http://publications.idiap.ch/attachments/reports/2004/rr-04-66.pdf</subfield>

<subfield code="x">PUBLIC</subfield>

</datafield>

<subfield code="a">Idiap-RR-66-2004</subfield>

</datafield>

<subfield code="b">IDIAP</subfield>

<subfield code="a">Martigny, Switzerland</subfield>

</datafield>

<subfield code="a">in Proc. Int. Conf. on Multimodal Interfaces (ICMI,',','), Trento, Oct. 2005.</subfield>

</datafield>

</datafield>

</record>

</collection>