CONF
gatica02d-conf/IDIAP
Audio-Visual Speaker Tracking with Importance Particle Filters
Gatica-Perez, Daniel
Lathoud, Guillaume
McCowan, Iain A.
Odobez, Jean-Marc
Moore, Darren
EXTERNAL
https://publications.idiap.ch/attachments/reports/2002/rr02-37.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/gatica02d
Related documents
IEEE International Conference on Image Processing (ICIP)
2003
We present a probabilistic methodology for audio-visual (AV) speaker tracking, using an uncalibrated wide-angle camera and a microphone array. The algorithm fuses 2-D object shape and audio information via importance particle filters (I-PFs,',','),
allowing for the asymmetrical integration of AV information in a way that efficiently exploits the complementary features of each modality. Audio localization information is used to generate an importance sampling (IS) function, which guides the random search process of a particle filter towards regions of the configuration space likely to contain the true configuration (a speaker). The measurement process integrates contour-based and audio observations, which results in reliable head tracking in realistic scenarios. We show that imperfect single modalities can be combined into an algorithm that automatically initializes and tracks a speaker, switches between multiple speakers, tolerates visual clutter, and recovers from total AV object occlusion, in the context of a multimodal meeting room.
REPORT
gatica02d/IDIAP
Audio-Visual Speaker Tracking with Importance Particle Filters
Gatica-Perez, Daniel
Lathoud, Guillaume
McCowan, Iain A.
Odobez, Jean-Marc
Moore, Darren
EXTERNAL
https://publications.idiap.ch/attachments/reports/2002/rr02-37.pdf
PUBLIC
Idiap-RR-37-2002
2002
IDIAP
We present a probabilistic methodology for audio-visual (AV) speaker tracking, using an uncalibrated wide-angle camera and a microphone array. The algorithm fuses 2-D object shape and audio information via importance particle filters (I-PFs,',','),
allowing for the asymmetrical integration of AV information in a way that efficiently exploits the complementary features of each modality. Audio localization information is used to generate an importance sampling (IS) function, which guides the random search process of a particle filter towards regions of the configuration space likely to contain the true configuration (a speaker). The measurement process integrates contour-based and audio observations, which results in reliable head tracking in realistic scenarios. We show that imperfect single modalities can be combined into an algorithm that automatically initializes and tracks a speaker, switches between multiple speakers, tolerates visual clutter, and recovers from total AV object occlusion, in the context of a multimodal meeting room.