Conversational Speech Recognition Needs Data? Experiments with Austrian German
Type of publication: | Conference paper |
Citation: | Linke_LREC_2022 |
Publication status: | Published |
Booktitle: | Proceedings of the 13th Language Resources and Evaluation Conference |
Year: | 2022 |
Month: | June |
Pages: | 4684--4691 |
Organization: | European Language Resources Association |
Address: | Marseille, France |
Crossref: | Idiap-Internal-RR-10-2022 |
URL: | http://www.lrec-conf.org/proce... |
Abstract: | Conversational speech represents one of the most complex of automatic speech recognition (ASR) tasks owing to the high inter-speaker variation in both pronunciation and conversational dynamics. Such complexity is particularly sensitive to low-resourced (LR) scenarios. Recent developments in self-supervision have allowed such scenarios to take advantage of large amounts of otherwise unrelated data. In this study, we characterise an (LR) Austrian German conversational task. We begin with a non-pre-trained baseline and show that fine-tuning of a model pre-trained using self-supervision leads to improvements consistent with those in the literature; this extends to cases where a lexicon and language model are included. We also show that the advantage of pre-training indeed arises from the larger database rather than the self-supervision. Further, by use of a leave-one-conversation out technique, we demonstrate that robustness problems remain with respect to inter-speaker and inter-conversation variation. This serves to guide where future research might best be focused in light of the current state-of-the-art. |
Keywords: | |
Projects |
Idiap |
Authors | |
Added by: | [UNK] |
Total mark: | 0 |
Attachments
|
|
Notes
|
|
|