DEEP NEURAL NETWORK BASED POSTERIORS FOR TEXT-DEPENDENT SPEAKER VERIFICATION

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Idiap-RR
Citation:	Dey_Idiap-RR-08-2016
Number:	Idiap-RR-08-2016
Year:	2016
Month:	4
Institution:	Idiap
Abstract:	The i-vector and Joint Factor Analysis (JFA) systems for text- dependent speaker verification use sufficient statistics computed from a speech utterance to estimate speaker models. These statis- tics average the acoustic information over the utterance thereby losing all the sequence information. In this paper, we study ex- plicit content matching using Dynamic Time Warping (DTW) and present the best achievable error rates for speaker-dependent and speaker-independent content matching. For this purpose, a Deep Neural Network/Hidden Markov Model Automatic Speech Recog- nition (DNN/HMM ASR) system is used to extract content-related posterior probabilities. This approach outperforms systems using Gaussian mixture model posteriors by at least 50% Equal Error Rate (EER) on the RSR2015 in content mismatch trials. DNN posteriors are also used in i-vector and JFA systems, obtaining EERs as low as 0.02%.
Keywords:
Projects	Idiap SIIP
Authors	Dey, Subhadeep Madikeri, Srikanth Ferras, Marc Motlicek, Petr
Crossref by	Dey_ICASSP_2016
Added by:	[ADM]
Total mark:	0
Attachments
Dey_Idiap-RR-08-2016.pdf (MD5: e21ab3ae980519fb172497605eb9c93a) (New change added)
Notes

processing time: 0.0578 seconds.