logo Idiap Research Institute        
 [BibTeX] [Marc21]
Comparing Biosignal and Acoustic feature Representation for Continuous Emotion Recognition
Type of publication: Conference paper
Citation: Yadav_ACMMM2022_2022
Publication status: Accepted
Booktitle: International Multimodal Sentiment Analysis Workshop and Challenge
Year: 2022
Abstract: abstract: Automatic recognition of human emotion has a wide range of applications. Human emotions can be identified across different modalities, such as biosignal, speech, text, and mimics. This paper is focusing on time-continuous prediction of level of valence and psycho-physiological arousal. In that regard, we investigate, (a) the use of different feature embeddings obtained from neural networks pre-trained on different speech tasks (e.g., phone classification, speech emotion recognition) and self-supervised neural networks, (b) estimation of arousal and valence from physiological signals in an end-to-end manner and (c) combining different neural embeddings. Our investigations on the MuSe-Stress sub-challenge shows that (a) the embeddings extracted from physiological signals using CNNs trained in an end-to-end manner improves over the baseline approach of modeling physiological signals, (b) neural embeddings obtained from phone classification neural network and speech emotion recognition neural network trained on auxiliary language data sets yield improvement over baseline systems purely trained on the target data, and (c) task specific neural embeddings yield improved performance over self-supervised neural embeddings for both arousal and valence. Our best performing system on test-set surpass the DeepSpectrum baseline (combined score) by a relative 7.7% margin.
Keywords: breathing patterns, Emotion Recognition, modalities fusion, pre-trained embedding, Self-supervised embedding
Projects TIPS
Authors Yadav, Sarthak
Purohit, Tilak
Mostaani, Zohreh
Vlasenko, Bogdan
Magimai.-Doss, Mathew
Added by: [UNK]
Total mark: 0
  • Yadav_ACMMM2022_2022.pdf