Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Conference paper
Citation:	Purohit_ICML_2022
Publication status:	Accepted
Booktitle:	Proceedings of the ICML Expressive Vocalizations Workshop held in conjunction with the 39th International Conference on Machine Learning
Year:	2022
Location:	Maryland, USA
Abstract:	The ICML Expressive Vocalizations (ExVo) Multi-task challenge 2022, focuses on understanding the emotional facets of the non-linguistic vocalizations (vocal bursts (VB)). The objective of this challenge is to predict emotional intensities for VB, being a multi-task challenge it also requires to predict speakers' age and native-country. For this challenge we study and compare two distinct embedding spaces namely, self-supervised learning (SSL) based embeddings and task-specific supervised learning based embeddings. Towards that, we investigate feature representations obtained from several pre-trained SSL neural networks and task-specific supervised classification neural networks. Our studies show that the best performance is obtained with an hybrid approach, where predictions derived via both SSL and task-specific supervised learning are used. Our best system on test-set surpass the ComPARE baseline (harmonic-mean of all sub-task scores i.e., S_MLT) by a relative 13% margin.
Keywords:	Emotion Recognition, Expressive Vocalizations, Multi-task learning, Self-supervised embedding
Projects	EMIL
Authors	Purohit, Tilak Ben Mahmoud, Imen Vlasenko, Bogdan Magimai-Doss, Mathew
Added by:	[UNK]
Total mark:	0
Attachments
Purohit_ICML_2022.pdf
Notes

processing time: 0.0322 seconds.