Few-shot Dysarthric Speech Recognition with Text-to-Speech Data Augmentation

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Conference paper
Citation:	Hermann_INTERSPEECH_2023
Publication status:	Published
Booktitle:	Proceedings of Interspeech
Year:	2023
Pages:	156-160
URL:	https://www.isca-speech.org/ar...
DOI:	10.21437/Interspeech.2023-2481
Abstract:	Speakers with dysarthria could particularly benefit from assistive speech technology, but are underserved by current automatic speech recognition (ASR) systems. The differences of dysarthric speech pose challenges, while recording large amounts of training data can be exhausting for patients. In this paper, we synthesise dysarthric speech with a FastSpeech 2-based multi-speaker text-to-speech (TTS) system for ASR data augmentation. We evaluate its few-shot capability by generating dysarthric speech with as few as 5 words from an unseen target speaker and then using it to train speaker-dependent ASR systems. The results indicated that, while the TTS output is not yet of sufficient quality, this could allow easy development of personalised acoustic models for new dysarthric speakers and domains in the future.
Keywords:	Automatic Speech Recognition, Dysarthria, Few-shot learning, speech synthesis
Projects	Idiap TAPAS IICT
Authors	Hermann, Enno Magimai-Doss, Mathew
Added by:	[UNK]
Total mark:	0
Attachments
Hermann_INTERSPEECH_2023.pdf
Notes

processing time: 0.0010 seconds.