Open-Vocabulary Keyword Spotting With Audio And Text Embeddings

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Conference paper
Citation:	Sacchi_INTERSPEECH_2019
Publication status:	Published
Booktitle:	Proceedings of Interspeech 2019
Year:	2019
DOI:	DOI: 10.21437/Interspeech.2019-1846
Abstract:	Keyword Spotting (KWS) systems allow detecting a set of spoken (pre-defined) keywords. Open-vocabulary KWS systems search for the keywords in the set of word hypotheses generated by an automatic speech recognition (ASR) system which is computationally expensive and, therefore, often implemented as a cloud-based service. Besides, KWS systems could use also word classification algorithms that do not allow easily changing the set of words to be recognized, as the classes have to be defined a priori, even before training the system. In this paper, we propose the implementation of an open-vocabulary ASR-free KWS system based on speech and text encoders that allow matching the computed embeddings in order to spot whether a keyword has been uttered. This approach would allow choosing the set of keywords a posteriori while requiring low computational power. The experiments, performed on two different datasets, show that our method is competitive with other state of the art KWS systems while allowing for a flexibility of configuration and being computationally efficient.
Keywords:	ASR-free, audio&text embeddings, keyword spotting, open vocabulary, speech recognition
Projects	CTI-Shaped
Authors	Sacchi, Niccolò Nanchen, Alexandre Jaggi, Martin Cernak, Milos
Added by:	[UNK]
Total mark:	0
Attachments
Sacchi_INTERSPEECH_2019.pdf
Notes

processing time: 0.0352 seconds.