CONF
Parthasarathi_ICMI-MLMI2009_2009/IDIAP
Speaker Change Detection with Privacy-Preserving Audio Cues
Parthasarathi, Sree Hari Krishnan
Magimai-Doss, Mathew
Gatica-Perez, Daniel
Bourlard, Hervé
EXTERNAL
https://publications.idiap.ch/attachments/papers/2009/Parthasarathi_ICMI-MLMI2009_2009.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/Parthasarathi_Idiap-RR-23-2009
Related documents
Proceedings of ICMI-MLMI 2009
2009
In this paper we investigate a set of privacy-sensitive audio features for speaker change detection (SCD) in multiparty conversations. These features are based on three different principles: characterizing the excitation source information using linear prediction residual, characterizing subband spectral information shown to contain speaker information, and characterizing the general shape of the spectrum. Experiments show that the performance of the privacy-sensitive features is comparable or better than that of the state-of-the-art full-band spectral-based features, namely, mel frequency cepstral coefficients, which suggests that socially acceptable ways of recording conversations in real-life is feasible.
REPORT
Parthasarathi_Idiap-RR-23-2009/IDIAP
Speaker Change Detection with Privacy-Preserving Audio Cues
Parthasarathi, Sree Hari Krishnan
Magimai-Doss, Mathew
Gatica-Perez, Daniel
Bourlard, Hervé
EXTERNAL
https://publications.idiap.ch/attachments/reports/2009/Parthasarathi_Idiap-RR-23-2009.pdf
PUBLIC
Idiap-RR-23-2009
2009
Idiap
Idiap Research Institute, Martigny, Switzerland
August 2009
In this paper we investigate a set of privacy-sensitive audio features for speaker change detection (SCD) in multiparty conversations. These features are based on three different principles: characterizing the excitation source information using linear prediction residual, characterizing subband spectral information shown to contain speaker information, and characterizing the general shape of the spectrum. Experiments show that the performance of the privacy-sensitive features is comparable or better than that of the state-of-the-art full-band spectral-based features, namely, mel frequency cepstral coefficients, which suggests that socially acceptable ways of recording conversations in real-life is feasible.