Audioa??Visual Synchronisation for Speaker Diarisation

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Conference paper
Citation:	Garau_INTERSPEECH_2010
Booktitle:	International Conference on Speech and Language Processing, Interspeech
Year:	2010
Month:	9
Location:	Makuhari, Japan
Abstract:	The role of audioâ€“visual speech synchrony for speaker diarisation is investigated on the multiparty meeting domain. We measured both mutual information and canonical correlation on different sets of audio and video features. As acoustic features we considered energy and MFCCs. As visual features we experimented both with motion intensity features, computed on the whole image, and Kanade Lucas Tomasi motion estimation. Thanks to KLT we decomposed the motion in its horizontal and vertical components. The vertical component was found to be more reliable for speech synchrony estimation. The mutual information between acoustic energy and KLT vertical motion of skin pixels, not only resulted in a 20% relative improvement over a MFCC only diarisation system, but also outperformed visual features such as motion intensities and head poses.
Keywords:	audioâ€“visual speech synchrony, canonical correlation analysis, multimodal speaker diarisation, multiparty meetings, mutual information
Projects	Idiap AMIDA IM2
Authors	Garau, Giulia Dielmann, Alfred Bourlard, Hervé
Added by:	[UNK]
Total mark:	0
Attachments
Garau_INTERSPEECH_2010.pdf
Notes

processing time: 0.0010 seconds.