Probability-Aware Word-Confusion-Network-to-Text Alignment Approach for Intent Classification
| Type of publication: | Conference paper |
| Citation: | VILLATORO-TELLO_ICASSP'24_2023 |
| Publication status: | Accepted |
| Booktitle: | Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024 |
| Year: | 2024 |
| Month: | April |
| Pages: | 12617-12621 |
| Publisher: | IEEE |
| Location: | Seoul, Republic of Korea |
| URL: | https://ieeexplore.ieee.org/do... |
| DOI: | 10.1109/ICASSP48485.2024.10445934 |
| Abstract: | Spoken Language Understanding (SLU) technologies have seen a big improvement due to the effective pretraining of speech representations. A common requirement of industry-based solutions is the portability to deploy SLU models in voice-assistant devices. Thus, distilling knowledge from large text-based language models has become an attractive solution for achieving good performance and guaranteeing portability. In this paper, we introduce a novel architecture that uses a cross-modal attention mechanism to extract bin-level contextual embeddings from a word-confusion network (WNC) encoding such that these can be directly compared and aligned with traditional text-based contextual embeddings. This alignment is achieved using a recently proposed tokenwise constrastive loss function. We validated our architecture's effectiveness by fine-tuning our WCN-based pretrained model to perform intent classification on the SLURP dataset. Obtained accuracy (81%), depicts a 9.4% relative improvement compared to a recent and equivalent E2E method. |
| Main Research Program: | Sustainable & Resilient Societies |
| Additional Research Programs: |
AI for Everyone |
| Keywords: | Cross-modal Alignment, Intent Classification, knowledge distillation, Spoken Language Understanding, Word-Confusion-Networks |
| Projects: |
UNIPHORE |
| Authors: | |
| Added by: | [UNK] |
| Total mark: | 0 |
|
Attachments
|
|
|
Notes
|
|
|
|
|