Update cookies preferences
 logo Idiap Research Institute        
 [BibTeX] [Marc21]
Emotion information recovery potential of wav2vec2 network fine-tuned for speech recognition task
Type of publication: Conference paper
Citation: Purohit_ICASSP-4_2025
Booktitle: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Year: 2025
Crossref: Purohit_ICASSP-3_2025
Abstract: Fine-tuning has become a norm to achieve state-of- the-art performance when employing pre-trained networks like foundation models. These models are typically pre-trained on large-scale unannotated data using self-supervised learning (SSL) methods. The SSL-based pre-training on large-scale data enables the network to learn the inherent structure/properties of the data, providing it with capabilities in generalization and knowledge transfer for various downstream tasks. However, when fine-tuned for a specific task, these models become task-specific. Finetuning may cause distortions in the patterns learned by the network during pre-training. In this work, we investigate these distortions by analyzing the network’s information recovery capabilities by designing a study where speech emotion recognition is the target task and automatic speech recognition is an intermediary task. We show that the network recovers the task-specific information but with a shift in the decisions also through attention analysis, we demonstrate some layers do not recover the information fully.
Keywords:
Projects EMIL
Authors Purohit, Tilak
Magimai-Doss, Mathew
Added by: [UNK]
Total mark: 0
Attachments
  • Purohit_ICASSP-4_2025.pdf
Notes