CONF ajmera2002icslp/IDIAP Unknown-Multiple Speaker clustering using HMM Ajmera, Jitendra Bourlard, Hervé Lapidot, I. McCowan, Iain A. EXTERNAL https://publications.idiap.ch/attachments/reports/2002/ajmera2002icslp.pdf PUBLIC https://publications.idiap.ch/index.php/publications/showcite/ajmera-rr-02-07 Related documents ICSLP 2002 Denver, Colorado 573-576 IDIAP-RR 02-07 An HMM-based speaker clustering framework is presented, where the number of speakers and segmentation boundaries are unknown \emph{a priori}. Ideally, the system aims to create one pure cluster for each speaker. The HMM is ergodic in nature with a minimum duration topology. The final number of clusters is determined automatically by merging closest clusters and retraining this new cluster, until a decrease in likelihood is observed. In the same framework, we also examine the effect of using only the features from highly voiced frames as a means of improving the robustness and computational complexity of the algorithm. The proposed system is assessed on the 1996 HUB-4 evaluation test set in terms of both cluster and speaker purity. It is shown that the number of clusters found often correspond to the actual number of speakers. REPORT ajmera-rr-02-07/IDIAP Unknown-Multiple Speaker clustering using HMM Ajmera, Jitendra Bourlard, Hervé Lapidot, I. McCowan, Iain A. EXTERNAL https://publications.idiap.ch/attachments/reports/2002/rr02-07.pdf PUBLIC Idiap-RR-07-2002 2002 IDIAP Martigny, Switzerland ICSLP, Denver, Colorado, 2002 An HMM-based speaker clustering framework is presented, where the number of speakers and segmentation boundaries are unknown \emph{a priori}. Ideally, the system aims to create one pure cluster for each speaker. The HMM is ergodic in nature with a minimum duration topology. The final number of clusters is determined automatically by merging closest clusters and retraining this new cluster, until a decrease in likelihood is observed. In the same framework, we also examine the effect of using only the features from highly voiced frames as a means of improving the robustness and computational complexity of the algorithm. The proposed system is assessed on the 1996 HUB-4 evaluation test set in terms of both cluster and speaker purity. It is shown that the number of clusters found often correspond to the actual number of speakers.