XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models

Type of publication:	Conference paper
Citation:	Kumar_ICASSP2025_2025
Publication status:	Published
Booktitle:	Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Year:	2025
Month:	April
Publisher:	IEEE
Location:	Hyderabad, India
ISSN:	2379-190X
ISBN:	979-8-3503-6874-1
Crossref:	Kumar_Idiap-RR-08-2024: XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models, Kumar, Shashi, Madikeri, Srikanth, Zuluaga-Gomez, Juan, Villatoro-Tello, Esaú, Iuliia, Nigmatulina, Motlicek, Petr, E, Manjunath K and Ganapathiraju, Aravind, Idiap-RR-08-2024
URL:	https://ieeexplore.ieee.org/do...
DOI:	https://doi.org/10.1109/ICASSP49660.2025.10888110
Abstract:	Self-supervised pretrained models exhibit competitive performance in automatic speech recognition (ASR) on finetuning, even with limited in-domain supervised data. However, popular pretrained models are not suitable for streaming ASR because they are trained with full attention context. In this paper, we introduce XLSR-Transducer, where the XLSR-53 model is used as encoder in transducer setup. Our experiments on the AMI dataset reveal that the XLSR-Transducer achieves 4% absolute WER improvement over Whisper large-v2 and 8% over a Zipformer transducer model trained from scratch. To enable streaming capabilities, we investigate different attention masking patterns in the self-attention computation of transformer layers within the XLSR-53 model. We validate XLSR-Transducer on AMI and 5 languages from CommonVoice under low-resource scenarios. Finally, with the introduction of attention sinks, we reduce the left context by half while achieving a relative 12% improvement in WER.
Main Research Program:	Human-AI Teaming
Additional Research Programs:	AI for Everyone
Keywords:	self-supervised learning, streaming ASR, transformer transducer, XLSR
Projects:	UNIPHORE ELOQUENCE
Authors:	Kumar, Shashi Madikeri, Srikanth Zuluaga-Gomez, Juan Villatoro-Tello, Esaú Thorbecke, Iuliia Motlicek, Petr E, Manjunath K Ganapathiraju, Aravind
Added by:	[UNK]
Total mark:	0
Attachments
Kumar_ICASSP2025_2025.pdf
Notes

processing time: 0.0003 seconds.