<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
	<record>
		<datafield tag="980" ind1=" " ind2=" ">
			<subfield code="a">CONF</subfield>
		</datafield>
		<datafield tag="970" ind1=" " ind2=" ">
			<subfield code="a">Purohit_ICASSP-2_2025/IDIAP</subfield>
		</datafield>
		<datafield tag="245" ind1=" " ind2=" ">
			<subfield code="a">Emotion information recovery potential of wav2vec2 network fine-tuned for speech recognition task</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Purohit, Tilak</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Magimai-Doss, Mathew</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">ASR</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">domain adaptation</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">Finetuning</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">Foundation Models</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">Speech Emotion Recognition</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">wav2vec2.0</subfield>
		</datafield>
		<datafield tag="856" ind1="4" ind2="0">
			<subfield code="i">EXTERNAL</subfield>
			<subfield code="u">http://publications.idiap.ch/attachments/papers/2025/Purohit_ICASSP-2_2025.pdf</subfield>
			<subfield code="x">PUBLIC</subfield>
		</datafield>
		<datafield tag="711" ind1="2" ind2=" ">
			<subfield code="a">Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)</subfield>
			<subfield code="c">Hyderabad, India</subfield>
		</datafield>
		<datafield tag="260" ind1=" " ind2=" ">
			<subfield code="c">2025</subfield>
			<subfield code="b">IEEE</subfield>
		</datafield>
		<datafield tag="520" ind1=" " ind2=" ">
			<subfield code="a">Fine-tuning has become a norm to achieve state-of-the-art performance when employing pre-trained networks like foundation models. These models are typically pre-trained on large-scale unannotated data using self-supervised learning (SSL) methods. The SSL-based pre-training on large-scale data enables the network to learn the inherent structure/properties of the data, providing it with capabilities in generalization and knowledge transfer for various downstream tasks. However, when fine-tuned for a specific task, these models become task-specific. Finetuning may cause distortions in the patterns learned by the network during pre-training.  In this work, we investigate these distortions by analyzing the network's information recovery capabilities by designing a study where speech emotion recognition is the target task and automatic speech recognition is an intermediary task. 
We show that the network recovers the task-specific information but with a shift in the decisions also through attention analysis, we demonstrate some layers do not recover the information fully.</subfield>
		</datafield>
	</record>
</collection>