CONF Vlasenko_INTERSPEECH_2025/IDIAP Multimodal Prosody Modeling: A Use Case for Multilingual Sentence Mode Prediction Vlasenko, Bogdan Magimai-Doss, Mathew emotional prosody Multilingual Multimodal sentence mode prediction EXTERNAL https://publications.idiap.ch/attachments/papers/2025/Vlasenko_INTERSPEECH_2025.pdf PUBLIC Proceedings of Interspeech 2025 Prosody modeling has garnered significant attention from the speech processing community. Recent developments in multilingual latent spaces for representing linguistic and acoustic information have become a new trend in various research directions. Therefore, we decided to evaluate the ability of multilingual acoustic neural embeddings and knowledge-based features to preserve sentence-mode-related information at the suprasegmental level. For linguistic information modeling, we selected neural embeddings based on word- and phoneme-level latent space representations. The experimental study was conducted using Italian, French, and German audiobook recordings, as well as emotional speech samples from EMO-DB. Both intra- and inter-language experimental protocols were used to assess classification performance for uni- and multimodal (early fusion approach) features. For comparison, we used a sentence mode prediction system built on top of automatically generated WHISPER-based transcripts.