CONF
Dey_ICASSP-2_2016/IDIAP
INFORMATION THEORETIC CLUSTERING FOR UNSUPERVISED DOMAIN-ADAPTATION
Dey, Subhadeep
Madikeri, Srikanth
Motlicek, Petr
https://publications.idiap.ch/index.php/publications/showcite/Dey_Idiap-RR-09-2016
Related documents
Proceedings of 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016)
Shanghai
2016
IEEE
5580-5584
The aim of the domain-adaptation task for speaker verification is
to exploit unlabelled target domain data by using the labelled source
domain data effectively. The i-vector based Probabilistic Linear Dis-
criminant Analysis (PLDA) framework approaches this task by clus-
tering the target domain data and using each cluster as a unique
speaker to estimate PLDA model parameters. These parameters are
then combined with the PLDA parameters from the source domain.
Typically, agglomerative clustering with cosine distance measure is
used. In tasks such as speaker diarization that also require unsuper-
vised clustering of speakers, information-theoretic clustering mea-
sures have been shown to be effective. In this paper, we employ
the Information Bottleneck (IB) clustering technique to find speaker
clusters in the target domain data. This is achieved by optimizing the
IB criterion that minimizes the information loss during the cluster-
ing process. The greedy optimization of the IB criterion involves ag-
glomerative clustering using the Jensen-Shannon divergence as the
distance metric. Our experiments in the domain-adaptation task in-
dicate that the proposed system outperforms the baseline by about
14% relative in terms of equal error rate.
REPORT
Dey_Idiap-RR-09-2016/IDIAP
INFORMATION THEORETIC CLUSTERING FOR UNSUPERVISED DOMAIN-ADAPTATION
Dey, Subhadeep
Madikeri, Srikanth
Motlicek, Petr
Idiap-RR-09-2016
2016
Idiap
April 2016
The aim of the domain-adaptation task for speaker verification is
to exploit unlabelled target domain data by using the labelled source
domain data effectively. The i-vector based Probabilistic Linear Dis-
criminant Analysis (PLDA) framework approaches this task by clus-
tering the target domain data and using each cluster as a unique
speaker to estimate PLDA model parameters. These parameters are
then combined with the PLDA parameters from the source domain.
Typically, agglomerative clustering with cosine distance measure is
used. In tasks such as speaker diarization that also require unsuper-
vised clustering of speakers, information-theoretic clustering mea-
sures have been shown to be effective. In this paper, we employ
the Information Bottleneck (IB) clustering technique to find speaker
clusters in the target domain data. This is achieved by optimizing the
IB criterion that minimizes the information loss during the cluster-
ing process. The greedy optimization of the IB criterion involves ag-
glomerative clustering using the Jensen-Shannon divergence as the
distance metric. Our experiments in the domain-adaptation task in-
dicate that the proposed system outperforms the baseline by about
14% relative in terms of equal error rate.