Multimodal Multispeaker Probabilistic Tracking in Meetings
Type of publication: | Idiap-RR |
Citation: | gatica05c |
Number: | Idiap-RR-66-2004 |
Year: | 2004 |
Institution: | IDIAP |
Address: | Martigny, Switzerland |
Note: | in Proc. Int. Conf. on Multimodal Interfaces (ICMI,',','), Trento, Oct. 2005. |
Abstract: | Tracking speakers in multiparty conversations constitutes a fundamental task for automatic meeting analysis. In this paper, we present a probabilistic approach to jointly track the location and speaking activity of multiple speakers in a multisensor meeting room, equipped with a small microphone array and multiple uncalibrated cameras. Our framework is based on a mixed-state dynamic graphical model defined on a multiperson state-space, which includes the explicit definition of a proximity-based interaction model. The model integrates audio-visual (AV) data through a novel observation model. Audio observations are derived from a source localization algorithm. Visual observations are based on models of the shape and spatial structure of human heads. Approximate inference in our model, needed given its complexity, is performed with a Markov Chain Monte Carlo particle filter (MCMC-PF,',','), which results in high sampling efficiency. We present results -based on an objective evaluation procedure- that show that our framework (1) is capable of locating and tracking the position and speaking activity of multiple meeting participants engaged in real conversations with good accuracy; (2) can deal with cases of visual clutter and partial occlusion; and (3) significantly outperforms a traditional sampling-based approach. |
Userfields: | ipdmembership={speech, vision}, language={English}, |
Keywords: | |
Projects |
Idiap |
Authors | |
Crossref by |
gatica05c-conf |
Added by: | [UNK] |
Total mark: | 0 |
Attachments
|
|
Notes
|
|
|