How Does Pre-trained Wav2Vec 2.0 Perform on Domain-Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Conference paper
Citation:	Juan_SLT2023-2_2023
Publication status:	Accepted
Booktitle:	2023 IEEE Spoken Language Technology Workshop (SLT)
Series:	1
Volume:	1
Number:	1
Year:	2023
Month:	January
Organization:	IEEE
URL:	https://arxiv.org/abs/2203.168...
Abstract:	Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data to build robust end-to-end (E2E) acoustic models (AM) that can be later fine-tuned on downstream tasks e.g., automatic speech recognition (ASR). Yet, few works investigated the impact on performance when the data properties substantially differ between the pre-training and fine-tuning phases, termed domain shift. We target this scenario by analyzing the robustness of Wav2Vec 2.0 and XLS-R models on downstream ASR for a completely unseen domain, air traffic control (ATC) communications. We benchmark these two models on several open-source and challenging ATC databases with signal-to-noise ratio between 5 and 20 dB. Relative word error rate (WER) reductions between 20% to 40% are obtained in comparison to hybrid-based ASR baselines by only fine-tuning E2E acoustic models with a smaller fraction of labeled data. We analyze WERs on the low-resource scenario and gender bias carried by one ATC dataset.
Keywords:	air traffic control communications, Automatic Speech Recognition, self-supervised pre-training, wav2vec 2.0
Projects	Idiap HAAWAII EC H2020- ATCO2
Authors	Zuluaga-Gomez, Juan Prasad, Amrutha Iuliia, Nigmatulina Sarfjoo, Seyyed Saeed Motlicek, Petr Kleinert, Matthias Helmke, Hartmut Ohneiser, Oliver Zhan, Qingran
Added by:	[UNK]
Total mark:	0
Attachments
Juan_SLT2023-2_2023.pdf
Notes

processing time: 0.0004 seconds.