logo Idiap Research Institute        
 [BibTeX] [Marc21]
A COMPARISON OF METHODS FOR OOV-WORD RECOGNITION ON A NEW PUBLIC DATASET
Type of publication: Conference paper
Citation: Braun_ICASSP2021_2021
Publication status: Published
Booktitle: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing
Year: 2021
Month: June
Location: Toronto, Ontario, Canada
Organization: IEEE Signal Processing Society
Abstract: A common problem for automatic speech recognition systems is how to recognize words that they did not see during training. Currently there is no established method of evaluating different techniques for tackling this problem. We propose using the CommonVoice dataset to create test sets for multiple languages which have a high out-of-vocabulary (OOV) ratio relative to a training set and release a new tool for calculating relevant performance metrics. We then evaluate, within the context of a hybrid ASR system, how much better subword models are at recognizing OOVs, and how much benefit one can get from incorporating OOV-word information into an existing system by modify ing WFSTs. Additionally, we propose a new method for modifying a subword-based language model so as to better recognize OOV-words. We showcase very large improvements in OOV-word recognition and make both the data and code available.
Keywords:
Projects Idiap
EC H2020- ATCO2
SARAL
Authors Braun, Rudolf
Madikeri, Srikanth
Motlicek, Petr
Added by: [UNK]
Total mark: 0
Attachments
  • Braun_ICASSP2021_2021.pdf
Notes