Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Conference paper
Citation:	Sarkar_INTERSPEECH_2023
Publication status:	Accepted
Booktitle:	Proceedings of Interspeech
Year:	2023
Abstract:	Self-supervised learning (SSL) models use only the intrinsic structure of a given signal, independent of its acoustic domain, to extract essential information from the input to an embedding space. This implies that the utility of such representations is not limited to modeling human speech alone. Building on this understanding, this paper explores the cross-transferability of SSL neural representations learned from human speech to analyze bio-acoustic signals. We conduct a caller discrimination analysis and a caller detection study on Marmoset vocalizations using eleven SSL models pre-trained with various pretext tasks. The results show that the embedding spaces carry meaningful caller information and can successfully distinguish the individual identities of Marmoset callers without fine-tuning. This demonstrates that representations pre-trained on human speech can be effectively applied to the bio-acoustics domain, providing valuable insights for future investigations in this field.
Keywords:
Projects	Idiap EVOLANG
Authors	Sarkar, Eklavya Magimai-Doss, Mathew
Added by:	[UNK]
Total mark:	0
Attachments
Sarkar_INTERSPEECH_2023.pdf
Notes

processing time: 0.0009 seconds.