Speech Data Selection for Efficient ASR Fine-Tuning using Domain Classifier and Pseudo-Label Filtering
| Type of publication: | Conference paper |
| Citation: | Rangappa_ICASSP2025_2025 |
| Publication status: | Published |
| Booktitle: | 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025) |
| Year: | 2025 |
| Month: | April |
| URL: | https://ieeexplore.ieee.org/do... |
| DOI: | 10.1109/ICASSP49660.2025.10888138 |
| Abstract: | In real-world speech data processing, the scarcity of annotated data and the abundance of unlabelled speech data present a significant challenge. To address this, we propose an efficient data selection pipeline for fine-tuning ASR models by generating pseudo-labels using WhisperX pipeline and selecting efficient labels for fine-tuning. In our work, we propose a domain classifier system developed with a computationally inexpensive TFIDF and classical machine learning algorithm. Later, we filter data from the classifier output using a novel metric that assesses word ratio and perplexity distribution. The filtered pseudo labels are then used for fine-tuning standard encoder- decoder Whisper models and Zipformer. Our proposed data selection pipeline reduces the dataset size by approximately 1/100th while maintaining performance comparable to the full dataset, outperforming random domain-independent selection strategies. |
| Main Research Program: | Human-AI Teaming |
| Keywords: | |
| Projects: |
UNIPHORE ELOQUENCE |
| Authors: | |
| Added by: | [UNK] |
| Total mark: | 0 |
|
Attachments
|
|
|
Notes
|
|
|
|
|