logo Idiap Research Institute        
 [BibTeX] [Marc21]
Finding Audio-Visual Events in Informal Social Gatherings
Type of publication: Conference paper
Citation: Alameda-Pineda_ICMI_2011
Publication status: Published
Booktitle: IEEE/ACM 13th International Conference on Multimodal Interaction
Year: 2011
Note: Oustanding paper award
Abstract: In this paper we address the problem of detecting and localizing objects that can be both seen and heard, e.g., people. This may be solved within the framework of data clustering. We propose a new multimodal clustering algorithm based on a Gaussian mixture model, where one of the modalities (visual data) is used to supervise the clustering process. This is made possible by mapping both modalities into the same metric space. To this end, we fully exploit the geometric and physical properties of an audio-visual sensor based on binocular vision and binaural hearing. We propose an EM algorithm that is theoretically well justified, intuitive, and extremely efficient from a computational point of view. This efficiency makes the method implementable on advanced platforms such as humanoid robots. We describe in detail tests and experiments performed with publicly available data sets that yield very interesting results.
Keywords: 3D localization, audio-visual fusion, event detection, scene analysis
Projects Idiap
HUMAVIPS
Authors Alameda-Pineda, Xavier
Khalidov, Vasil
Horaud, Radu
Forbes, Florence
Added by: [UNK]
Total mark: 0
Attachments
  • Alameda-Pineda_ICMI_2011.pdf
Notes