logo Idiap Research Institute        
 [BibTeX] [Marc21]
Probability-Aware Word-Confusion-Network-to-Text Alignment Approach for Intent Classification
Type of publication: Conference paper
Citation: VILLATORO-TELLO_ICASSP'24_2023
Publication status: Accepted
Booktitle: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024
Year: 2024
Month: April
Pages: 12617-12621
Publisher: IEEE
Location: Seoul, Republic of Korea
URL: https://ieeexplore.ieee.org/do...
DOI: 10.1109/ICASSP48485.2024.10445934
Abstract: Spoken Language Understanding (SLU) technologies have seen a big improvement due to the effective pretraining of speech representations. A common requirement of industry-based solutions is the portability to deploy SLU models in voice-assistant devices. Thus, distilling knowledge from large text-based language models has become an attractive solution for achieving good performance and guaranteeing portability. In this paper, we introduce a novel architecture that uses a cross-modal attention mechanism to extract bin-level contextual embeddings from a word-confusion network (WNC) encoding such that these can be directly compared and aligned with traditional text-based contextual embeddings. This alignment is achieved using a recently proposed tokenwise constrastive loss function. We validated our architecture's effectiveness by fine-tuning our WCN-based pretrained model to perform intent classification on the SLURP dataset. Obtained accuracy (81%), depicts a 9.4% relative improvement compared to a recent and equivalent E2E method.
Keywords: Cross-modal Alignment, Intent Classification, knowledge distillation, Spoken Language Understanding, Word-Confusion-Networks
Projects UNIPHORE
Authors Villatoro-Tello, Esaú
Madikeri, Srikanth
Sharma, Bidisha
Khalil, Driss
Kumar, Shashi
Iuliia, Nigmatulina
Motlicek, Petr
Ganapathiraju, Aravind
Added by: [UNK]
Total mark: 0
Attachments
  • VILLATORO-TELLO_ICASSP24_2023.pdf
Notes