SPARSE AUTOENCODERS TO ENHANCE SPEECH RECOGNITION

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Idiap-RR
Citation:	Kabil_Idiap-RR-10-2022
Number:	Idiap-RR-10-2022
Year:	2022
Month:	8
Institution:	Idiap
Note:	Submitted to ICASSP 2021
Abstract:	Starting from a state-of-the-art speech recognition system as baseline, we investigate the use of sparse autoencoders to fur- ther process the baseline acoustic model outputs (senone likelihoods) to obtain even more reliable senone and phone likelihoods with better classification performance. Inspired by dictionary learning, we use shallow overcomplete sparse autoencoders to project the senone likelihoods to high-dimensional sparse space, which is expected to increase the separability among different speech units while suppressing the distortions. On Augmented Multiparty Interaction (AMI) dataset with both Individual Head-mounted Microphone (IHM) and Single-Distant Microphone (SDM) conditions, we show that classifiers trained on the high-dimensional sparse representations from the sparse autoencoders improve senone and phone classification performance compared to the baseline, especially in noisy (SDM) condition. As for the word recognition, we obtain promising improvements when weighted combi- nation of the senone likelihoods from the baseline acoustic model and from our senone classifier is used for decoding.
Keywords:	Deep neural network, high-dimensional sparse representations, sparse autoencoder, speech recognition
Projects	Idiap SHISSM
Authors	Kabil, Selen Hande Bourlard, Hervé
Added by:	[ADM]
Total mark:	0
Attachments
Kabil_Idiap-RR-10-2022.pdf (MD5: b9ad63fbacc820298d08c6acbd17ec41)
Notes

processing time: 0.0009 seconds.