AV16.3: an Audio-Visual Corpus for Speaker Localization and Tracking
| Type of publication: | Idiap-RR |
| Citation: | lathoud-rr-04-28 |
| Number: | Idiap-RR-28-2004 |
| Year: | 2004 |
| Institution: | IDIAP |
| Address: | Martigny, Switzerland |
| Note: | Published in ``Proceedings of the 2004 MLMI Workshop'' |
| Abstract: | Assessing the quality of a speaker localization or tracking algorithm on a few short examples is difficult, especially when the ground-truth is absent or not well defined. One step towards systematic performance evaluation of such algorithms is to provide time-continuous speaker location annotation over a series of real recordings, covering various test cases. Areas of interest include audio, video and audio-visual speaker localization and tracking. The desired location annotation can be either 2-dimensional (image plane) or 3-dimensional (physical space). This paper motivates and describes a corpus of audio-visual data called ``AV16.3'', along with a method for 3-D location annotation based on calibrated cameras. ``16.3'' stands for 16 microphones and 3 cameras, recorded in a fully synchronized manner, in a meeting room. Part of this corpus has already been successfully used to report research results. |
| Userfields: | ipdinar={2004}, ipdmembership={speech, vision}, language={English}, |
| Keywords: | |
| Projects: |
Idiap |
| Authors: | |
| Crossref by |
lathoud04c |
| Added by: | [UNK] |
| Total mark: | 0 |
|
Attachments
|
|
|
Notes
|
|
|
|
|