CONF Fritsch_ICASSP_2020/IDIAP Estimating The Degree of Sleepiness by Integrating Articulatory Feature Knowledge In Raw Waveform Based CNNs Fritsch, Julian Dubagunta, S. Pavankumar Magimai-Doss, Mathew articulatory features Convolutional Neural Networks end-to-end acoustic modeling Paralinguistic speech processing sleepiness EXTERNAL http://publications.idiap.ch/attachments/papers/2020/Fritsch_ICASSP_2020.pdf PUBLIC http://publications.idiap.ch/index.php/publications/showcite/Fritsch_Idiap-RR-06-2019 Related documents International Conference on Acoustics, Speech, and Signal Processing (ICASSP) Barcelona, Spain 2020 Speech-based degree of sleepiness estimation is an emerging research problem. This paper investigates an end-to-end approach, where given raw waveform as input, a convolutional neural network (CNN) estimates at its output the degree of sleepiness. Within this approach, we investigate constraining the first layer processing and integration of speech production knowledge through transfer learning. We evaluate these methods on the continuous sleepiness corpus of the Interspeech 2019 Computational Paralinguistics (ComParE) Challenge and demonstrate that the proposed approach consistently yields competitive systems. In particular, we observe that integration of speech production knowledge aids in improving the performance and yields systems that are complementary. REPORT Fritsch_Idiap-RR-06-2019/IDIAP Estimating The Degree of Sleepiness by Integrating Articulatory Feature Knowledge In Raw Waveform Based CNNs Fritsch, Julian Dubagunta, S. Pavankumar Magimai-Doss, Mathew EXTERNAL http://publications.idiap.ch/attachments/reports/2019/Fritsch_Idiap-RR-06-2020.pdf PUBLIC Idiap-RR-06-2020 2019 Idiap February 2019 Speech-based degree of sleepiness estimation is an emerging research problem. In the literature, this problem has been mainly addressed through modeling of low level of descriptors. This paper investigates an end-to-end approach, where given raw waveform as input, a neural network estimates at its output the degree of sleepiness. Through an investigation on the continuous sleepiness sub-challenge of the INTERSPEECH 2019 Computational Paralinguistics Challenge, we show that the proposed approach consistently yields performance comparable or better than low level descriptor-based, bag-of-audio-words-based and sequence-to-sequence autoencoder feature representation-based regression systems. Furthermore, a confusion matrix analysis on the development set shows that, unlike the best baseline system, the performance of our approach is not centering around a few degrees of sleepiness, but is spread across all the degrees of sleepiness.