COMPARISON OF SUBWORD SEGMENTATION METHODS FOR OPEN-VOCABULARYEND-TO-END SPEECH RECOGNITION

Type of publication:	Idiap-RR
Citation:	Khosravani_Idiap-RR-34-2020
Number:	Idiap-RR-34-2020
Year:	2020
Month:	12
Institution:	Idiap
Note:	Submitted to SLT 2021 conference, DAHL project
Abstract:	To address the open vocabulary problem in the context of end-to-end automatic speech recognition (ASR), we experiment with subword segmentation approaches, specifically byte-pair encoding and unigram language model. Such approaches are attractive in general for morphologically rich languages, and in particular for German. We propose a technique which computes the tokenization rate of an utterance transcription in the spirit of the out-of-vocabulary (OOV) metric that would be used for closed vocabularies. We show that this tokenization rate can then be used to rank evaluation utterances in terms of recognition difficulty. Using this technique we show that the optimal choice of subword segmentation technique depends on the expected tokenization rate of the domain. We further show that a hybrid solution exists and can lead to improved performance. For the ASR model, we employ wav2letter, a fully convolutional sequence-to-sequence encoder architecture using time-depth separable convolution blocks and a lexicon-free beam search decoding with n-gram subword language model.
Keywords:	end-to-end, German language, open-vocabulary, speech recognition, subword segmentation
Projects:	Idiap
Authors:	Khosravani, Abbas Musat, Claudiu Garner, Philip N. Lazaridis, Alexandros
Added by:	[ADM]
Total mark:	0
Attachments
Khosravani_Idiap-RR-34-2020.pdf (MD5: 5bf175534665fa6d0f8f3a09ad353151)
Notes

processing time: 0.0003 seconds.