Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Conference paper
Citation:	Dey_INTERSPEECH2019_2019
Publication status:	Accepted
Booktitle:	Proc. of Interspeech 2019
Year:	2019
Abstract:	In this paper, we explore various approaches for semi- supervised learning in an end-to-end automatic speech recog- nition (ASR) framework. The first step in our approach in- volves training a seed model on the limited amount of labelled data. Additional unlabelled speech data is employed through a data-selection mechanism to obtain the best hypothesized out- put, further used to retrain the seed model. However, uncer- tainties of the model may not be well captured with a single hypothesis. As opposed to this technique, we apply a dropout mechanism to capture the uncertainty by obtaining multiple hy- pothesized text transcripts of an speech recording. We assume that the diversity of automatically generated transcripts for an utterance will implicitly increase the reliability of the model. Finally, the data-selection process is also applied on these hy- pothesized transcripts to reduce the uncertainty. Experiments on freely-available TEDLIUM corpus and proprietary Adobe’s internal dataset show that the proposed approach significantly reduces ASR errors, compared to the baseline model.
Keywords:
Projects	Innosuisse-SM2
Authors	Dey, Subhadeep Motlicek, Petr Bui, Trung Dernoncourt, Franck
Added by:	[UNK]
Total mark:	0
Attachments

Notes

processing time: 0.0004 seconds.