logo Idiap Research Institute        
 [BibTeX] [Marc21]
Pseudo-Syntactic Language Modeling for Disfluent Speech Recognition
Type of publication: Idiap-RR
Citation: McGreevy04a
Number: Idiap-RR-55-2004
Year: 2004
Institution: IDIAP
Note: Published in Proceedings of SST, 2004
Abstract: Language models for speech recognition are generally trained on text corpora. Since these corpora do not contain the disfluencies found in natural speech, there is a train/test mismatch when these models are applied to conversational speech. In this work we investigate a language model (LM) designed to model these disfluencies as a syntactic process. By modeling self-corrections we obtain an improvement over our baseline syntactic model. We also obtain a 30\% relative reduction in perplexity from the best performing standard {N-gram} model when we interpolate it with our syntactically derived models.
Userfields: ipdmembership={speech},
Keywords:
Projects Idiap
Authors McGreevy, Michael
Crossref by McGreevy04b
Added by: [UNK]
Total mark: 0
Attachments
  • rr04-55.pdf
  • rr04-55.ps.gz
Notes