Sparse Subspace Modeling for Query by Example Spoken Term Detection

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Journal paper
Citation:	Ram_IEEETASLP_2018
Publication status:	Accepted
Journal:	IEEE/ACM Transactions on Audio, Speech, and Language Processing
Year:	2018
Abstract:	This paper focuses on the problem of query by example spoken term detection (QbE-STD) in zero-resource scenario. Current state-of-the-art approaches to tackle this problem rely on dynamic programming based template matching techniques using phone posterior features extracted at the output of a deep neural network (DNN). Previously, it has been shown that the space of phone posteriors is highly structured, as a union of low-dimensional subspaces. To exploit the temporal and sparse structure of the speech data, we investigate here three different QbE-STD systems based on sparse model recovery. More specifically, we use query examples to model the query subspace using dictionary for sparse coding. Reconstruction errors calculated using sparse representation of feature vectors are then used to characterize the underlying subspaces. The first approach uses these reconstruction errors in a dynamic programming framework to detect the spoken query, resulting in a much faster search compared to standard template matching. The other two methods aim at merging template matching and sparsity based approaches to further improve the performance. The first one proposes to regularize the template matching local distances using sparse reconstruction errors. The second approach aims at using the sparse reconstruction errors to rescore (improve) the template matching likelihood. Experiments on two different databases (AMI and MediaEval) show that the proposed hybrid systems perform better than a highly competitive QbE-STD baseline system.
Keywords:
Projects	PHASER-QUAD
Authors	Ram, Dhananjay Asaei, Afsaneh Bourlard, Hervé
Added by:	[UNK]
Total mark:	0
Attachments
Ram_IEEETASLP_2018.pdf
Notes

processing time: 0.0003 seconds.