CONF Yadav_ACMMM2022_2022/IDIAP Comparing Biosignal and Acoustic feature Representation for Continuous Emotion Recognition Yadav, Sarthak Purohit, Tilak Mostaani, Zohreh Vlasenko, Bogdan Magimai-Doss, Mathew breathing patterns Emotion Recognition modalities fusion pre-trained embedding Self-supervised embedding EXTERNAL http://publications.idiap.ch/attachments/papers/2022/Yadav_ACMMM2022_2022.pdf PUBLIC International Multimodal Sentiment Analysis Workshop and Challenge 2022 abstract: Automatic recognition of human emotion has a wide range of applications. Human emotions can be identified across different modalities, such as biosignal, speech, text, and mimics. This paper is focusing on time-continuous prediction of level of valence and psycho-physiological arousal. In that regard, we investigate, (a) the use of different feature embeddings obtained from neural networks pre-trained on different speech tasks (e.g., phone classification, speech emotion recognition) and self-supervised neural networks, (b) estimation of arousal and valence from physiological signals in an end-to-end manner and (c) combining different neural embeddings. Our investigations on the MuSe-Stress sub-challenge shows that (a) the embeddings extracted from physiological signals using CNNs trained in an end-to-end manner improves over the baseline approach of modeling physiological signals, (b) neural embeddings obtained from phone classification neural network and speech emotion recognition neural network trained on auxiliary language data sets yield improvement over baseline systems purely trained on the target data, and (c) task specific neural embeddings yield improved performance over self-supervised neural embeddings for both arousal and valence. Our best performing system on test-set surpass the DeepSpectrum baseline (combined score) by a relative 7.7% margin.