CONF
McGreevy04b/IDIAP
Pseudo-Syntactic Language Modeling for Disfluent Speech Recognition
McGreevy, Michael
EXTERNAL
https://publications.idiap.ch/attachments/papers/2004/mcgreevy-sst04.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/mcgreevy04a
Related documents
Proceedings of SST 2004 (10th Australian International Conference on Speech Science & Technology,',','),
Sydney, Australia, 2004
2004
December 2004
IDIAP-RR 04-55
Language models for speech recognition are generally trained on text corpora. Since these corpora do not contain the disfluencies found in natural speech, there is a train/test mismatch when these models are applied to conversational speech. In this work we investigate a language model (LM) designed to model these disfluencies as a syntactic process. By modeling self-corrections we obtain an improvement over our baseline syntactic model. We also obtain a 30\% relative reduction in perplexity from the best performing standard {N-gram} model when we interpolate it with our syntactically derived models.
REPORT
McGreevy04a/IDIAP
Pseudo-Syntactic Language Modeling for Disfluent Speech Recognition
McGreevy, Michael
EXTERNAL
https://publications.idiap.ch/attachments/reports/2004/rr04-55.pdf
PUBLIC
Idiap-RR-55-2004
2004
IDIAP
Published in Proceedings of SST, 2004
Language models for speech recognition are generally trained on text corpora. Since these corpora do not contain the disfluencies found in natural speech, there is a train/test mismatch when these models are applied to conversational speech. In this work we investigate a language model (LM) designed to model these disfluencies as a syntactic process. By modeling self-corrections we obtain an improvement over our baseline syntactic model. We also obtain a 30\% relative reduction in perplexity from the best performing standard {N-gram} model when we interpolate it with our syntactically derived models.