Update cookies preferences
 logo Idiap Research Institute        
 [BibTeX] [Marc21]
Unsupervised Rhythm and Voice Conversion to Improve ASR on Dysarthric Speech
Type of publication: Conference paper
Citation: ElHajal_INTERSPEECH2025_2025
Publication status: Accepted
Booktitle: Proceedings of Interspeech
Year: 2025
Month: August
Publisher: ISCA
Location: Rotterdam, Netherlands
URL: https://arxiv.org/abs/2506.016...
Abstract: Automatic speech recognition (ASR) systems struggle with dysarthric speech due to high inter-speaker variability and slow speaking rates. To address this, we explore dysarthric-to-healthy speech conversion for improved ASR performance. Our approach extends the Rhythm and Voice (RnV) conversion framework by introducing a syllable-based rhythm modeling method suited for dysarthric speech. We assess its impact on ASR by training LF-MMI models and fine-tuning Whisper on converted speech. Experiments on the Torgo corpus reveal that LF-MMI achieves significant word error rate reductions, especially for more severe cases of dysarthria, while fine-tuning Whisper on converted data has minimal effect on its performance. These results highlight the potential of unsupervised rhythm and voice conversion for dysarthric ASR. Code available at: https://github.com/idiap/RnV.
Keywords:
Projects: PaSS
IICT
EMIL
Authors: El Hajal, Karl
Hermann, Enno
Hovsepyan, Sevada
Magimai-Doss, Mathew
Added by: [UNK]
Total mark: 0
Attachments
  • ElHajal_INTERSPEECH2025_2025.pdf
Notes