REPORT Sarfjoo_Idiap-RR-15-2019/IDIAP Idiap submission to the NIST SRE 2019 Speaker Recognition Evaluation Sarfjoo, Seyyed Saeed Madikeri, Srikanth Hajibabaei, Mahdi Motlicek, Petr Marcel, Sébastien adaptation batch normalization speaker recognition EXTERNAL http://publications.idiap.ch/attachments/reports/2019/Sarfjoo_Idiap-RR-15-2019.pdf PUBLIC Idiap-RR-15-2019 2019 Idiap Rue Marconi 19, 1920 Martigny November 2019 Idiap has made a submission to the conversational telephony speech (CTS) challenge of the NIST SRE 2019. The submission consists of six speaker verification (SV) systems: four extended TDNN (E-TDNN) and two TDNN x-vector systems. Employment of various training sets, loss functions, adaptation sets and extracted speech features is among the main differences of the submitted systems. Domain adaptation is represented by a supervised method (developed using a limited data) with transfer learning of the batch norm layers. % was applied. Mean shift and covariance estimation of batch norm allows to map the target domain to the source domain, alleviating the problem of over-fitting on the adaptation data. The back-end of all the systems is represented by the conventional Linear Discriminant Analysis (LDA) projection followed by Probabilistic LDA (PLDA) scoring for inference. The PLDA was also adapted unsupervisedly using the unlabelled part of the NIST SRE 2018 set. In addition, training the LDA and PLDA using in-domain data was investigated. The entire system was built around the Kaldi toolkit.

<subfield code="a">REPORT</subfield>

</datafield>

<subfield code="a">Sarfjoo_Idiap-RR-15-2019/IDIAP</subfield>

</datafield>

<subfield code="a">Idiap submission to the NIST SRE 2019 Speaker Recognition Evaluation</subfield>

</datafield>

<subfield code="a">Sarfjoo, Seyyed Saeed</subfield>

</datafield>

<subfield code="a">Madikeri, Srikanth</subfield>

</datafield>

<subfield code="a">Hajibabaei, Mahdi</subfield>

</datafield>

<subfield code="a">Motlicek, Petr</subfield>

</datafield>

<subfield code="a">Marcel, Sébastien</subfield>

</datafield>

<subfield code="a">adaptation</subfield>

</datafield>

<subfield code="a">batch normalization</subfield>

</datafield>

<subfield code="a">speaker recognition</subfield>

</datafield>

<subfield code="i">EXTERNAL</subfield>

<subfield code="u">http://publications.idiap.ch/attachments/reports/2019/Sarfjoo_Idiap-RR-15-2019.pdf</subfield>

<subfield code="x">PUBLIC</subfield>

</datafield>

<subfield code="a">Idiap-RR-15-2019</subfield>

</datafield>

<subfield code="b">Idiap</subfield>

<subfield code="a">Rue Marconi 19, 1920 Martigny</subfield>

</datafield>

<subfield code="d">November 2019</subfield>

</datafield>

<subfield code="a">Idiap has made a submission to the conversational telephony speech (CTS) challenge of the NIST SRE 2019. The submission consists of six speaker verification (SV) systems: four extended TDNN (E-TDNN) and two TDNN x-vector systems. Employment of various training sets, loss functions, adaptation sets and extracted speech features is among the main differences of the submitted systems. Domain adaptation is represented by a supervised method (developed using a limited data) with transfer learning of the batch norm layers. % was applied. Mean shift and covariance estimation of batch norm allows to map the target domain to the source domain, alleviating the problem of over-fitting on the adaptation data. The back-end of all the systems is represented by the conventional Linear Discriminant Analysis (LDA) projection followed by Probabilistic LDA (PLDA) scoring for inference. The PLDA was also adapted unsupervisedly using the unlabelled part of the NIST SRE 2018 set. In addition, training the LDA and PLDA using in-domain data was investigated. The entire system was built around the Kaldi toolkit.</subfield>

</datafield>

</record>

</collection>