AV16.3: an Audio-Visual Corpus for Speaker Localization and Tracking

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Idiap-RR
Citation:	lathoud-rr-04-28
Number:	Idiap-RR-28-2004
Year:	2004
Institution:	IDIAP
Address:	Martigny, Switzerland
Note:	Published in ``Proceedings of the 2004 MLMI Workshop''
Abstract:	Assessing the quality of a speaker localization or tracking algorithm on a few short examples is difficult, especially when the ground-truth is absent or not well defined. One step towards systematic performance evaluation of such algorithms is to provide time-continuous speaker location annotation over a series of real recordings, covering various test cases. Areas of interest include audio, video and audio-visual speaker localization and tracking. The desired location annotation can be either 2-dimensional (image plane) or 3-dimensional (physical space). This paper motivates and describes a corpus of audio-visual data called ``AV16.3'', along with a method for 3-D location annotation based on calibrated cameras. ``16.3'' stands for 16 microphones and 3 cameras, recorded in a fully synchronized manner, in a meeting room. Part of this corpus has already been successfully used to report research results.
Userfields:	ipdinar={2004}, ipdmembership={speech, vision}, language={English},
Keywords:
Projects	Idiap
Authors	Lathoud, Guillaume Odobez, Jean-Marc Gatica-Perez, Daniel
Crossref by	lathoud04c
Added by:	[UNK]
Total mark:	0
Attachments
rr-04-28.pdf rr-04-28.ps.gz
Notes

processing time: 0.0027 seconds.