SYSTEM FUSION AND SPEAKER LINKING FOR LONGITUDINAL DIARIZATION OF TV SHOWS

Type of publication:	Conference paper
Citation:	Ferras_ICASSP_2016
Booktitle:	Proceedings of 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016)
Year:	2016
Month:	March
Pages:	5495-5499
Publisher:	IEEE
Location:	Shanghai
Abstract:	Performing speaker diarization while uniquely identifying the speakers in a collection of audio recordings is a challenging task. Based on our previous work on speaker diarization and linking, we developed a system for diarizing longitudinal TV show data sets based on the fusion of speaker diarization system outputs and speaker linking. Agreement between multiple diarization outputs is found prior to speaker linking, largely reducing the diarization error rate at the expense of keeping some speech data unlabelled. To deal with noisy clusters, a linear prediction based technique was used to label speakers after linking. Considerable gains for both fusion and labelling are reported. Despite the challenges of the longitudinal diarization task, this system obtained similar performance for linked and non-linked tasks under moderate session variability, highlighting the viability of a linking approach to longitudinal diarization of speech in the presence of noise, music and special audio effects.
Keywords:
Projects:	Idiap SIIP
Authors:	Ferras, Marc Madikeri, Srikanth Motlicek, Petr Bourlard, Hervé
Added by:	[UNK]
Total mark:	0
Attachments
Ferras_ICASSP_2016.pdf
Notes

processing time: 0.0010 seconds.