Supervised domain adaptation for text-independent speaker verification using limited data

Type of publication:	Conference paper
Citation:	Sarfjoo_INTERSPEECH_2020
Publication status:	Accepted
Booktitle:	Interspeech
Year:	2020
Pages:	3815-3819
URL:	http://www.interspeech2020.org...
Abstract:	To adapt the speaker verification (SV) system to a target domain with limited data, this paper investigates the transfer learning of the model pre-trained on the source domain data. To that end, layer-by-layer adaptation with transfer learning from the initial and final layers of the pre-trained model is investigated. We show that the model adapted from the initial layers outperforms the model adapted from the final layers. Based on this evidence, and inspired by the works in image recognition field, we hypothesize that low-level convolutional neural network (CNN) layers characterize domain-specific component while high-level CNN layers are domain-independent and have more discriminative power. For adapting these domain-specific components, angular margin softmax (AMSoftmax) applied on the CNN-based implementation of the x-vector architecture. In addition, to reduce the problem of over-fitting on the limited target data, transfer learning on the batch norm layers is investigated. Mean shift and covariance estimation of batch norm allows to map the represented components of the target domain to the source domain. Using TDNN and E-TDNN versions of the x-vectors as baseline models, the adapted models on the development set of NIST SRE 2018 outperformed the baselines with relative improvements of 11.0 and 13.8 %, respectively.
Keywords:	batch norm, speaker recognition, speaker verification, supervised adaptation, transfer learning
Projects	ODESSA EC H2020-ROXANNE
Authors	Sarfjoo, Seyyed Saeed Madikeri, Srikanth Motlicek, Petr Marcel, Sébastien
Added by:	[UNK]
Total mark:	0
Attachments
Sarfjoo_INTERSPEECH_2020.pdf
Notes

processing time: 0.0003 seconds.