Update cookies preferences
 logo Idiap Research Institute        
 [BibTeX] [Marc21]
Multimodal Prosody Modeling: A Use Case for Multilingual Sentence Mode Prediction
Type of publication: Conference paper
Citation: Vlasenko_INTERSPEECH_2025
Publication status: Accepted
Booktitle: Proceedings of Interspeech
Year: 2025
Abstract: Prosody modeling has garnered significant attention from the speech processing community. Recent developments in multilingual latent spaces for representing linguistic and acoustic information have become a new trend in various research directions. Therefore, we decided to evaluate the ability of multilingual acoustic neural embeddings and knowledge-based features to preserve sentence-mode-related information at the suprasegmental level. For linguistic information modeling, we selected neural embeddings based on word- and phoneme-level latent space representations. The experimental study was conducted using Italian, French, and German audiobook recordings, as well as emotional speech samples from EMO-DB. Both intra- and inter-language experimental protocols were used to assess classification performance for uni- and multimodal (early fusion approach) features. For comparison, we used a sentence mode prediction system built on top of automatically generated WHISPER-based transcripts.
Main Research Program: AI for Everyone
Additional Research Programs: Human-AI Teaming
AI for Everyone
Keywords: emotional prosody, Multilingual, Multimodal, sentence mode prediction
Projects: Idiap
IICT
Authors: Vlasenko, Bogdan
Magimai-Doss, Mathew
Added by: [UNK]
Total mark: 0
Attachments
  • Vlasenko_INTERSPEECH_2025.pdf
Notes