Update cookies preferences
 logo Idiap Research Institute        
 [BibTeX] [Marc21]
Enhancing Speaker Diarization using Correlation-Based Clustering Initialization
Type of publication: Idiap-RR
Citation: Rangappa_Idiap-RR-09-2025
Number: Idiap-RR-09-2025
Year: 2025
Month: 8
Institution: Idiap
Abstract: Speaker diarization becomes challenging in multilingual and code-switched speech due to frequent speaker changes and acoustic variability. While PyAnnote achieves state-of-the-art performance on standard benchmarks, its effectiveness drops on complex datasets like DISPLACE-2. To address this issue, we propose to improve the performance of the global agglomerative clustering by improving the input embeddings. Specifically, we enhance the embeddings by analyzing their pairwise correlations and averaging highly correlated embeddings. This approach improves speaker representation for highly correlated embeddings while reducing speaker confusion and improving clustering accuracy. Evaluated on DISPLACE-2 Track-1 (multilingual speaker diarization), our method shows a 3% relative DER improvement over the baseline, and 8% when combined with segmentation fine-tuning. Notably, the approach reduces DER in rapid turn-taking and language transition regions, improving robustness in code-mixed speech.
Main Research Program: Human-AI Teaming
Keywords: DISPLACE-2, ECAPA-TDNN embedding, local speaker segmentation, Speaker Diarization
Projects: EC H2020-ROXANNE
ELOQUENCE
Authors: Rangappa, Pradeep
Prasad, Amrutha
Madikeri, Srikanth
Motlicek, Petr
Added by: [ADM]
Total mark: 0
Attachments
  • Rangappa_Idiap-RR-09-2025.pdf (MD5: e7bc25b8416c6d7f1efda01d331814e8)
Notes