Multimodal Prosody Modeling: A Use Case for Multilingual Sentence Mode Prediction
Type of publication: | Conference paper |
Citation: | Vlasenko_INTERSPEECH_2025 |
Publication status: | Accepted |
Booktitle: | Proceedings of Interspeech |
Year: | 2025 |
Abstract: | Prosody modeling has garnered significant attention from the speech processing community. Recent developments in multilingual latent spaces for representing linguistic and acoustic information have become a new trend in various research directions. Therefore, we decided to evaluate the ability of multilingual acoustic neural embeddings and knowledge-based features to preserve sentence-mode-related information at the suprasegmental level. For linguistic information modeling, we selected neural embeddings based on word- and phoneme-level latent space representations. The experimental study was conducted using Italian, French, and German audiobook recordings, as well as emotional speech samples from EMO-DB. Both intra- and inter-language experimental protocols were used to assess classification performance for uni- and multimodal (early fusion approach) features. For comparison, we used a sentence mode prediction system built on top of automatically generated WHISPER-based transcripts. |
Main Research Program: | AI for Everyone |
Additional Research Programs: |
Human-AI Teaming AI for Everyone |
Keywords: | emotional prosody, Multilingual, Multimodal, sentence mode prediction |
Projects: |
Idiap IICT |
Authors: | |
Added by: | [UNK] |
Total mark: | 0 |
Attachments
|
|
Notes
|
|
|