Normalizing Flows for Speaker and Language Recognition Backend

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Conference paper
Citation:	Espuna_ODYSSEY2024_2024
Publication status:	Accepted
Booktitle:	Odyssey 2024: The Speaker and Language Recognition Workshop
Year:	2024
Abstract:	In this paper, we address the Gaussian distribution assumption made in PLDA, a popular back-end classifier used in Speaker and Language recognition tasks. We study normalizing flows, which allow using non-linear transformations and still obtain a model that can explicitly represent a probability density. The model makes no assumption about the distribution of the observations. This alleviates the need for length normalization, a well known data preprocessing step used to boost PLDA performance. We demonstrate the effectiveness of this flow model on NIST SRE16, LRE17 and LRE22 datasets. We observe that when applying length normalization, both the flow model and PLDA achieve similar EERs for SRE16 (11.5% vs 11.8%). However, when length normalization is not applied, the flow shows more robustness and offers better EERs (13.1% vs 17.1%). For LRE17 and LRE22, the best classification accuracies (84.2%, 75.5%) are obtained by the flow model without any need for length normalization.
Keywords:
Projects	Idiap armasuisse
Authors	Espuña, Aleix Prasad, Amrutha Motlicek, Petr Madikeri, Srikanth Christof, Schüpbach
Added by:	[UNK]
Total mark:	0
Attachments
Espuna_ODYSSEY2024_2024.pdf
Notes

processing time: 0.0003 seconds.