Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition
Type of publication: | Conference paper |
Citation: | Dey_INTERSPEECH2019_2019 |
Publication status: | Accepted |
Booktitle: | Proc. of Interspeech 2019 |
Year: | 2019 |
Abstract: | In this paper, we explore various approaches for semi- supervised learning in an end-to-end automatic speech recog- nition (ASR) framework. The first step in our approach in- volves training a seed model on the limited amount of labelled data. Additional unlabelled speech data is employed through a data-selection mechanism to obtain the best hypothesized out- put, further used to retrain the seed model. However, uncer- tainties of the model may not be well captured with a single hypothesis. As opposed to this technique, we apply a dropout mechanism to capture the uncertainty by obtaining multiple hy- pothesized text transcripts of an speech recording. We assume that the diversity of automatically generated transcripts for an utterance will implicitly increase the reliability of the model. Finally, the data-selection process is also applied on these hy- pothesized transcripts to reduce the uncertainty. Experiments on freely-available TEDLIUM corpus and proprietary Adobe’s internal dataset show that the proposed approach significantly reduces ASR errors, compared to the baseline model. |
Keywords: | |
Projects |
Innosuisse-SM2 |
Authors | |
Added by: | [UNK] |
Total mark: | 0 |
Attachments
|
|
Notes
|
|
|