CONF
Dubagunta_ICMI’22COMPANION_2022/IDIAP
Towards Automatic Prediction of Non-Expert Perceived Speech Fluency Ratings
Dubagunta, S. Pavankumar
Moneta, Edoardo
Theocharopoulos, Eleni
Magimai-Doss, Mathew
EXTERNAL
https://publications.idiap.ch/attachments/papers/2022/Dubagunta_ICMI?22COMPANION_2022.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/Dubagunta_Idiap-RR-11-2021
Related documents
ACM International Conference on Multimodal Interaction (ICMI Companion)
2022
https://doi.org/10.1145/3536220.3563689
doi
REPORT
Dubagunta_Idiap-RR-11-2021/IDIAP
Towards Automatic Prediction of Non-Expert Perceived Speech Fluency Ratings
Dubagunta, S. Pavankumar
Moneta, Edoardo
Theocharopoulos, Eleni
Magimai-Doss, Mathew
articulatory features
bag of audio words
low level descriptors
Perceived fluency
raw waveform modelling
speech assessment
Zero frequency filtering
EXTERNAL
https://publications.idiap.ch/attachments/reports/2021/Dubagunta_Idiap-RR-11-2021.pdf
PUBLIC
Idiap-RR-11-2021
2021
Idiap
August 2021
Automatic speech fluency prediction has been mainly approached from the perspective of computer aided language learning, where the system tends to predict ratings similar to those of the human experts. Speech fluency prediction, however, can be questioned in a more relaxed social setting, where the ratings arise mostly from non-experts. This paper explores the latter direction, i.e., prediction of non-expert perceived speech fluency ratings, which has not been studied in the speech technology literature, to the best of our knowledge. Toward that, we investigate different approaches, namely, (a) low-level descriptor feature functionals, (b) bag-of-audio word based approach and (c) neural network based end-to-end acoustic modelling approach. Our investigations on speech data collected from 54 speakers and rated by seven non-experts demonstrate that non-expert speech fluency ratings can be systematically predicted, with the best performing system yielding a Pearson's correlation coefficient of 0.66 and a Spearman's correlation coefficient of 0.67 with the median human scores.