CONF
Parthasarathi_INTERSPEECH_2009/IDIAP
Investigating Privacy-Sensitive Features for Speech Detection in Multiparty Conversations
Parthasarathi, Sree Hari Krishnan
Magimai-Doss, Mathew
Bourlard, Hervé
Gatica-Perez, Daniel
EXTERNAL
https://publications.idiap.ch/attachments/papers/2009/Parthasarathi_INTERSPEECH_2009.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/Parthasarathi_Idiap-RR-12-2009
Related documents
Proceedings of Interspeech 2009
2009
We investigate four different privacy-sensitive features, namely energy, zero crossing rate, spectral flatness, and kurtosis, for speech detection in multiparty conversations. We liken this scenario to a meeting room and define our datasets and annotations accordingly. The temporal context of these features is modeled. With no temporal context, energy is the best performing single feature. But by modeling temporal context, kurtosis emerges as the most effective feature. Also, we combine the features. Besides yielding a gain in performance, certain combinations of features also reveal that a shorter temporal context is sufficient. We then benchmark other privacy-sensitive features utilized in previous studies. Our experiments show that the performance of all the privacy-sensitive features modeled with context is close to that of state-of-the-art spectral-based features, without extracting and using any features that can be used to reconstruct the speech signal.
REPORT
Parthasarathi_Idiap-RR-12-2009/IDIAP
Investigating Privacy-Sensitive Features for Speech Detection in Multiparty Conversations
Parthasarathi, Sree Hari Krishnan
Magimai-Doss, Mathew
Bourlard, Hervé
Gatica-Perez, Daniel
EXTERNAL
https://publications.idiap.ch/attachments/reports/2009/Parthasarathi_Idiap-RR-12-2009.pdf
PUBLIC
Idiap-RR-12-2009
2009
Idiap
Idiap Research Institute, Martigny, Switzerland.
June 2009
We investigate four different privacy-sensitive features, namely energy, zero crossing rate, spectral flatness, and kurtosis, for speech detection in multiparty conversations. We liken this scenario to a meeting room and define our datasets and annotations accordingly. The temporal context of these features is modeled. With no temporal context, energy is the best performing single feature. But by modeling temporal context, kurtosis emerges as the most effective feature. Also, we combine the features. Besides yielding a gain in performance, certain combinations of features also reveal that a shorter temporal context is sufficient. We then benchmark other privacy-sensitive features utilized in previous studies. Our experiments show that the performance of all the privacy-sensitive features modeled with context is close to that of state-of-the-art spectral-based features, without extracting and using any features that can be used to reconstruct the speech signal.