Comparing Biosignal and Acoustic feature Representation for Continuous Emotion Recognition

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Conference paper
Citation:	Yadav_ACMMM2022_2022
Publication status:	Accepted
Booktitle:	International Multimodal Sentiment Analysis Workshop and Challenge
Year:	2022
Abstract:	abstract: Automatic recognition of human emotion has a wide range of applications. Human emotions can be identified across different modalities, such as biosignal, speech, text, and mimics. This paper is focusing on time-continuous prediction of level of valence and psycho-physiological arousal. In that regard, we investigate, (a) the use of different feature embeddings obtained from neural networks pre-trained on different speech tasks (e.g., phone classification, speech emotion recognition) and self-supervised neural networks, (b) estimation of arousal and valence from physiological signals in an end-to-end manner and (c) combining different neural embeddings. Our investigations on the MuSe-Stress sub-challenge shows that (a) the embeddings extracted from physiological signals using CNNs trained in an end-to-end manner improves over the baseline approach of modeling physiological signals, (b) neural embeddings obtained from phone classification neural network and speech emotion recognition neural network trained on auxiliary language data sets yield improvement over baseline systems purely trained on the target data, and (c) task specific neural embeddings yield improved performance over self-supervised neural embeddings for both arousal and valence. Our best performing system on test-set surpass the DeepSpectrum baseline (combined score) by a relative 7.7% margin.
Keywords:	breathing patterns, Emotion Recognition, modalities fusion, pre-trained embedding, Self-supervised embedding
Projects	TIPS EMIL
Authors	Yadav, Sarthak Purohit, Tilak Mostaani, Zohreh Vlasenko, Bogdan Magimai-Doss, Mathew
Added by:	[UNK]
Total mark:	0
Attachments
Yadav_ACMMM2022_2022.pdf
Notes

processing time: 0.0003 seconds.