CONF Motlicek_ICASSP_2012/IDIAP IMPROVING ACOUSTIC BASED KEYWORD SPOTTING USING LVCSR LATTICES Motlicek, Petr Valente, Fabio Szoke, Igor https://publications.idiap.ch/index.php/publications/showcite/Motlicek_Idiap-RR-36-2012 Related documents IEEE - Proceedings on IEEE International Conference on Acoustics, Speech and Signal Processing Japan 2012 4413-4416 This paper investigates detection of English keywords in a conversational scenario using a combination of acoustic and LVCSR based keyword spotting systems. Acoustic KWS systems search predefined words in parameterized spoken data. Corresponding confidences are represented by likelihood ratios given the keyword models and a background model. First, due to the especially high number of false-alarms, the acoustic KWS system is augmented with confidence measures estimated from corresponding LVCSR lattices. Then, various strategies to combine scores estimated by the acoustic and several LVCSR based KWS systems are explored. We show that a linear regression based combination significantly outperforms other (model-based) techniques. Due to that, the relative number of false-alarms of the combined KWS system decreased by more than 50% compared to the acoustic KWS system. Finally, an attention is also paid to the complexities of the KWS systems enabling them to potentially be exploited in real-detection tasks. REPORT Motlicek_Idiap-RR-36-2012/IDIAP IMPROVING ACOUSTIC BASED KEYWORD SPOTTING USING LVCSR LATTICES Motlicek, Petr Valente, Fabio Szoke, Igor Confidence Measure (CM) KeyWord Spotting (KWS) Spoken Term Detection (STD) EXTERNAL https://publications.idiap.ch/attachments/reports/2012/Motlicek_Idiap-RR-36-2012.pdf PUBLIC Idiap-RR-36-2012 2012 Idiap Rue Marconi 19 December 2012 This paper investigates detection of English keywords in a conversational scenario using a combination of acoustic and LVCSR based keyword spotting systems. Acoustic KWS systems search predefined words in parameterized spoken data. Corresponding confidences are represented by likelihood ratios given the keyword models and a background model. First, due to the especially high number of false-alarms, the acoustic KWS system is augmented with confidence measures estimated from corresponding LVCSR lattices. Then, various strategies to combine scores estimated by the acoustic and several LVCSR based KWS systems are explored. We show that a linear regression based combination significantly outperforms other (model-based) techniques. Due to that, the relative number of false-alarms of the combined KWS system decreased by more than 50% compared to the acoustic KWS system. Finally, an attention is also paid to the complexities of the KWS systems enabling them to potentially be exploited in real-detection tasks.