CONF
Vijayasenan_INTERSPEECH_2010/IDIAP
Advances in Fast Multistream Diarization based on the Information Bottleneck Framework
Vijayasenan, Deepu
Valente, Fabio
Bourlard, Hervé
https://publications.idiap.ch/index.php/publications/showcite/Vijayasenan_Idiap-RR-23-2010
Related documents
Proceedings of Interspeech
2010
Multistream diarization is an effective way to improve the diarization
performance, MFCC and Time Delay Of Arrivals
(TDOA) being the most commonly used features. This paper
extends our previous work on information bottleneck diarization
aiming to include large number of features besides MFCC
and TDOA while keeping computational costs low. At first
HMM/GMM and IB systems are compared in case of two and
four feature streams and analysis of errors is performed. Results
on a dataset of 17 meetings show that, in spite of comparable
oracle performances, the IB system is more robust to feature
weight variations. Then a sequential optimization is introduced
that further improves the speaker error by 5 − 8% relative. In
the last part, computational issues are discussed. The proposed
approach is significantly faster and its complexity marginally
grows with the number of feature streams running in 0.75 real
time even with four streams achieving a speaker error equal to
6%.
REPORT
Vijayasenan_Idiap-RR-23-2010/IDIAP
Advances in Fast Multistream Diarization based on the Information Bottleneck Framework
Vijayasenan, Deepu
Valente, Fabio
Bourlard, Hervé
Idiap-RR-23-2010
2010
Idiap
July 2010
Multistream diarization is an effective way to improve the diarization performance, MFCC and Time Delay Of Arrivals
(TDOA) being the most commonly used features. This paper extends our previous work on information bottleneck diarization aiming to include large number of features besides MFCC and TDOA while keeping computational costs low. At first HMM/GMM and IB systems are compared in case of two and four feature streams and analysis of errors is performed. Results on a dataset of 17 meetings show that, in spite of comparable oracle performances, the IB system is more robust to feature weight variations. Then a sequential optimization is introduced that further improves the speaker error by 5 − 8% relative. In the last part, computational issues are discussed. The proposed
approach is significantly faster and its complexity marginally grows with the number of feature streams running in 0.75 realtime even with four streams achieving a speaker error equal to 6%.