COMPARISON OF SUBWORD SEGMENTATION METHODS FOR OPEN-VOCABULARYEND-TO-END SPEECH RECOGNITION
Type of publication: | Idiap-RR |
Citation: | Khosravani_Idiap-RR-34-2020 |
Number: | Idiap-RR-34-2020 |
Year: | 2020 |
Month: | 12 |
Institution: | Idiap |
Note: | Submitted to SLT 2021 conference, DAHL project |
Abstract: | To address the open vocabulary problem in the context of end-to-end automatic speech recognition (ASR), we experiment with subword segmentation approaches, specifically byte-pair encoding and unigram language model. Such approaches are attractive in general for morphologically rich languages, and in particular for German. We propose a technique which computes the tokenization rate of an utterance transcription in the spirit of the out-of-vocabulary (OOV) metric that would be used for closed vocabularies. We show that this tokenization rate can then be used to rank evaluation utterances in terms of recognition difficulty. Using this technique we show that the optimal choice of subword segmentation technique depends on the expected tokenization rate of the domain. We further show that a hybrid solution exists and can lead to improved performance. For the ASR model, we employ wav2letter, a fully convolutional sequence-to-sequence encoder architecture using time-depth separable convolution blocks and a lexicon-free beam search decoding with n-gram subword language model. |
Keywords: | end-to-end, German language, open-vocabulary, speech recognition, subword segmentation |
Projects |
Idiap |
Authors | |
Added by: | [ADM] |
Total mark: | 0 |
Attachments
|
|
Notes
|
|
|