CONF lathoud04c/IDIAP AV16.3: an Audio-Visual Corpus for Speaker Localization and Tracking Lathoud, Guillaume Odobez, Jean-Marc Gatica-Perez, Daniel EXTERNAL http://publications.idiap.ch/attachments/papers/2004/lathoud04c.pdf PUBLIC http://publications.idiap.ch/index.php/publications/showcite/lathoud-rr-04-28 Related documents Proceedings of the 2004 MLMI Workshop, S. Bengio and H. Bourlard Eds, Springer Verlag 2005 IDIAP-RR 04-28 Assessing the quality of a speaker localization or tracking algorithm on a few short examples is difficult, especially when the ground-truth is absent or not well defined. One step towards systematic performance evaluation of such algorithms is to provide time-continuous speaker location annotation over a series of real recordings, covering various test cases. Areas of interest include audio, video and audio-visual speaker localization and tracking. The desired location annotation can be either 2-dimensional (image plane) or 3-dimensional (physical space). This paper motivates and describes a corpus of audio-visual data called ``AV16.3'', along with a method for 3-D location annotation based on calibrated cameras. ``16.3'' stands for 16 microphones and 3 cameras, recorded in a fully synchronized manner, in a meeting room. Part of this corpus has already been successfully used to report research results. REPORT lathoud-rr-04-28/IDIAP AV16.3: an Audio-Visual Corpus for Speaker Localization and Tracking Lathoud, Guillaume Odobez, Jean-Marc Gatica-Perez, Daniel EXTERNAL http://publications.idiap.ch/attachments/reports/2004/rr-04-28.pdf PUBLIC Idiap-RR-28-2004 2004 IDIAP Martigny, Switzerland Published in ``Proceedings of the 2004 MLMI Workshop'' Assessing the quality of a speaker localization or tracking algorithm on a few short examples is difficult, especially when the ground-truth is absent or not well defined. One step towards systematic performance evaluation of such algorithms is to provide time-continuous speaker location annotation over a series of real recordings, covering various test cases. Areas of interest include audio, video and audio-visual speaker localization and tracking. The desired location annotation can be either 2-dimensional (image plane) or 3-dimensional (physical space). This paper motivates and describes a corpus of audio-visual data called ``AV16.3'', along with a method for 3-D location annotation based on calibrated cameras. ``16.3'' stands for 16 microphones and 3 cameras, recorded in a fully synchronized manner, in a meeting room. Part of this corpus has already been successfully used to report research results.

</datafield>

<subfield code="a">lathoud04c/IDIAP</subfield>

</datafield>

<subfield code="a">AV16.3: an Audio-Visual Corpus for Speaker Localization and Tracking</subfield>

</datafield>

<subfield code="a">Lathoud, Guillaume</subfield>

</datafield>

<subfield code="a">Odobez, Jean-Marc</subfield>

</datafield>

<subfield code="a">Gatica-Perez, Daniel</subfield>

</datafield>

<subfield code="i">EXTERNAL</subfield>

<subfield code="u">http://publications.idiap.ch/attachments/papers/2004/lathoud04c.pdf</subfield>

<subfield code="x">PUBLIC</subfield>

</datafield>

<subfield code="u">http://publications.idiap.ch/index.php/publications/showcite/lathoud-rr-04-28</subfield>

<subfield code="z">Related documents</subfield>

</datafield>

<subfield code="a">Proceedings of the 2004 MLMI Workshop, S. Bengio and H. Bourlard Eds, Springer Verlag</subfield>

</datafield>

</datafield>

<subfield code="a">IDIAP-RR 04-28</subfield>

</datafield>

<subfield code="a">Assessing the quality of a speaker localization or tracking algorithm on a few short examples is difficult, especially when the ground-truth is absent or not well defined. One step towards systematic performance evaluation of such algorithms is to provide time-continuous speaker location annotation over a series of real recordings, covering various test cases. Areas of interest include audio, video and audio-visual speaker localization and tracking. The desired location annotation can be either 2-dimensional (image plane) or 3-dimensional (physical space). This paper motivates and describes a corpus of audio-visual data called ``AV16.3'', along with a method for 3-D location annotation based on calibrated cameras. ``16.3'' stands for 16 microphones and 3 cameras, recorded in a fully synchronized manner, in a meeting room. Part of this corpus has already been successfully used to report research results.</subfield>

</datafield>

</record>

<subfield code="a">REPORT</subfield>

</datafield>

<subfield code="a">lathoud-rr-04-28/IDIAP</subfield>

</datafield>

<subfield code="a">AV16.3: an Audio-Visual Corpus for Speaker Localization and Tracking</subfield>

</datafield>

<subfield code="a">Lathoud, Guillaume</subfield>

</datafield>

<subfield code="a">Odobez, Jean-Marc</subfield>

</datafield>

<subfield code="a">Gatica-Perez, Daniel</subfield>

</datafield>

<subfield code="i">EXTERNAL</subfield>

<subfield code="u">http://publications.idiap.ch/attachments/reports/2004/rr-04-28.pdf</subfield>

<subfield code="x">PUBLIC</subfield>

</datafield>

<subfield code="a">Idiap-RR-28-2004</subfield>

</datafield>

<subfield code="b">IDIAP</subfield>

<subfield code="a">Martigny, Switzerland</subfield>

</datafield>

<subfield code="a">Published in ``Proceedings of the 2004 MLMI Workshop''</subfield>

</datafield>

</datafield>

</record>

</collection>