Speech Data Selection for Efficient ASR Fine-Tuning using Domain Classifier and Pseudo-Label Filtering

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Conference paper
Citation:	Rangappa_ICASSP2025_2025
Publication status:	Accepted
Booktitle:	2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025)
Year:	2025
Month:	April
Abstract:	In real-world speech data processing, the scarcity of annotated data and the abundance of unlabelled speech data present a significant challenge. To address this, we propose an efficient data selection pipeline for fine-tuning ASR models by generating pseudo-labels using WhisperX pipeline and selecting efficient labels for fine-tuning. In our work, we propose a domain classifier system developed with a computationally inexpensive TFIDF and classical machine learning algorithm. Later, we filter data from the classifier output using a novel metric that assesses word ratio and perplexity distribution. The filtered pseudo labels are then used for fine-tuning standard encoder- decoder Whisper models and Zipformer. Our proposed data selection pipeline reduces the dataset size by approximately 1/100th while maintaining performance comparable to the full dataset, outperforming random domain-independent selection strategies.
Keywords:
Projects	UNIPHORE ELOQUENCE
Authors	Rangappa, Pradeep Zuluaga-Gomez, Juan Madikeri, Srikanth Carofilis, Andrés Prakash, Jeena Burdisso, Sergio Kumar, Shashi Villatoro-Tello, Esaú Iuliia, Nigmatulina Motlicek, Petr S, Karthik Pandia D Ganapathiraju, Aravind
Added by:	[UNK]
Total mark:	0
Attachments
Rangappa_ICASSP2025_2025.pdf
Notes

processing time: 0.0003 seconds.