MULTI-MODAL SPEAKER DIARIZATION OF REAL-WORLD MEETINGS USING COMPRESSED-DOMAIN VIDEO FEATURES

Type of publication:	Conference paper
Citation:	Friedland_ICASSP_2009
Booktitle:	International Conference on Audio, Speech and Signal Processing
Year:	2009
Abstract:	Speaker diarization is originally defined as the task of de- termining â€œwho spoke whenâ€ given an audio track and no other prior knowledge of any kind. The following article shows a multi-modal approach where we improve a state- of-the-art speaker diarization system by combining standard acoustic features (MFCCs) with compressed domain video features. The approach is evaluated on over 4.5 hours of the publicly available AMI meetings dataset which contains challenges such as people standing up and walking out of the room. We show a consistent improvement of about 34 % rela- tive in speaker error rate (21 % DER) compared to a state-of- the-art audio-only baseline.
Keywords:
Projects:	Idiap IM2
Authors:	Friedland, Gerald Hung, Hayley Yeo, Chuohao
Added by:	[UNK]
Total mark:	0
Attachments
Friedland_ICASSP_2009.pdf
Notes

processing time: 0.0003 seconds.