CONF
Fritsch_ICASSP_2020/IDIAP
Estimating The Degree of Sleepiness by Integrating Articulatory Feature Knowledge In Raw Waveform Based CNNs
Fritsch, Julian
Dubagunta, S. Pavankumar
Magimai-Doss, Mathew
articulatory features
Convolutional Neural Networks
end-to-end acoustic modeling
Paralinguistic speech processing
sleepiness
EXTERNAL
https://publications.idiap.ch/attachments/papers/2020/Fritsch_ICASSP_2020.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/Fritsch_Idiap-RR-06-2019
Related documents
International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Barcelona, Spain
2020
Speech-based degree of sleepiness estimation is an emerging research problem. This paper investigates an end-to-end approach, where given raw waveform as input, a convolutional neural network (CNN) estimates at its output the degree of sleepiness. Within this approach, we investigate constraining the first layer processing and integration of speech production knowledge through transfer learning. We evaluate these methods on the continuous sleepiness corpus of the Interspeech 2019 Computational Paralinguistics (ComParE) Challenge and demonstrate that the proposed approach consistently yields competitive systems. In particular, we observe that integration of speech production knowledge aids in improving the performance and yields systems that are complementary.
REPORT
Fritsch_Idiap-RR-06-2019/IDIAP
Estimating The Degree of Sleepiness by Integrating Articulatory Feature Knowledge In Raw Waveform Based CNNs
Fritsch, Julian
Dubagunta, S. Pavankumar
Magimai-Doss, Mathew
EXTERNAL
https://publications.idiap.ch/attachments/reports/2019/Fritsch_Idiap-RR-06-2020.pdf
PUBLIC
Idiap-RR-06-2020
2019
Idiap
February 2019
Speech-based degree of sleepiness estimation is an emerging research problem. In the literature, this problem has been mainly addressed through modeling of low level of descriptors. This paper investigates an end-to-end approach, where given raw waveform as input, a neural network estimates at its output the degree of sleepiness. Through an investigation on the continuous sleepiness sub-challenge of the INTERSPEECH 2019 Computational Paralinguistics Challenge, we show that the proposed approach consistently yields performance comparable or better than low level descriptor-based, bag-of-audio-words-based and sequence-to-sequence autoencoder feature representation-based regression systems. Furthermore, a confusion matrix analysis on the development set shows that, unlike the best baseline system, the performance of our approach is not centering around a few degrees of sleepiness, but is spread across all the degrees of sleepiness.