Visual Speaker Localization Aided by Acoustic Models

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Conference paper
Citation:	Friedland_ACMMM_2009
Booktitle:	ACM Multimedia
Year:	2009
Abstract:	The following paper presents a novel audio-visual approach for unsupervised speaker locationing. Using recordings from a single, low-resolution room overview camera and a single far-field microphone, a state-of-the art audio-only speaker localization system (traditionally called speaker diarization) is extended so that both acoustic and visual models are estimated as part of a joint unsupervised optimization problem. The speaker diarization system first automatically determines the number of speakers and estimates â€œwho spoke whenâ€, then, in a second step, the visual models are used to infer the location of the speakers in the video. The experiments were performed on real-world meetings using 4.5 hours of the publicly available AMI meeting corpus. The proposed system is able to exploit audio-visual integration to not only improve the accuracy of a state-of-the-art (audioonly) speaker diarization, but also adds visual speaker locationing at little incremental engineering and computation costs.
Keywords:
Projects	Idiap AMIDA IM2
Authors	Friedland, Gerald Yeo, Chuohao Hung, Hayley
Added by:	[UNK]
Total mark:	0
Attachments

Notes

processing time: 0.0010 seconds.