An Adaptive Initialization Method for Speaker Diarization based on Prosodic Features

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Idiap-RR
Citation:	Imseng_Idiap-RR-02-2010
Number:	Idiap-RR-02-2010
Year:	2010
Month:	1
Institution:	Idiap
Abstract:	The following article presents a novel, adaptive initialization scheme that can be applied to most state-ofthe-art Speaker Diarization algorithms, i.e. algorithms that use agglomerative hierarchical clustering with Bayesian Information Criterion (BIC) and Gaussian Mixture Models (GMMs) of frame-based cepstral features (MFCCs). The initialization method is a combination of the recently proposed â€œadaptive seconds per Gaussianâ€ (ASPG) method and a new pre-clustering and number of initial clusters estimation method based on prosodic features. The presented initialization method has two important advantages. First, the method requires no manual tuning and is robust against file length and speaker count variations. Second, the method outperforms our previously used initialization methods on all benchmark files that were presented in the 2006, 2007, and 2009 NIST Rich Transcription (RT) evaluations and results in a Diarization Error Rate (DER) improvement of up to 67% (relative).
Keywords:
Projects	Idiap IM2 AMIDA
Authors	Imseng, David Friedland, Gerald
Crossref by	Imseng_ICASSP_2010
Added by:	[ADM]
Total mark:	0
Attachments
Imseng_Idiap-RR-02-2010.pdf (MD5: 41f6a60deb860418cf7581e8125ff295)
Notes

processing time: 0.0003 seconds.