CONF Povey_ASRU2011_2011/IDIAP The Kaldi Speech Recognition Toolkit Povey, Daniel Ghoshal, Arnab Boulianne, Gilles Burget, Lukas Glembek, Ondrej Goel, Nagendra Hannemann, Mirko Motlicek, Petr Qian, Yanmin Schwarz, Petr Silovsky, Jan Stemmer, Georg Vesely, Karel ASR Automatic Speech Recognition GMM HTK SGMM EXTERNAL http://publications.idiap.ch/attachments/papers/2012/Povey_ASRU2011_2011.pdf PUBLIC http://publications.idiap.ch/index.php/publications/showcite/Povey_Idiap-RR-04-2012 Related documents IEEE 2011 Workshop on Automatic Speech Recognition and Understanding Hilton Waikoloa Village, Big Island, Hawaii, US 2011 IEEE Signal Processing Society IEEE Catalog No.: CFP11SRW-USB 978-1-4673-0366-8 We describe the design of Kaldi, a free, open-source toolkit for speech recognition research. Kaldi provides a speech recognition system based on finite-state transducers (using the freely available OpenFst), together with detailed documentation and scripts for building complete recognition systems. Kaldi is written is C++, and the core library supports modeling of arbitrary phonetic-context sizes, acoustic modeling with subspace Gaussian mixture models (SGMM) as well as standard Gaussian mixture models, together with all commonly used linear and affine transforms. Kaldi is released under the Apache License v2.0, which is highly nonrestrictive, making it suitable for a wide community of users. REPORT Povey_Idiap-RR-04-2012/IDIAP The Kaldi Speech Recognition Toolkit Povey, Daniel Ghoshal, Arnab Boulianne, Gilles Burget, Lukas Glembek, Ondrej Goel, Nagendra Hannemann, Mirko Motlicek, Petr Qian, Yanmin Schwarz, Petr Silovsky, Jan Stemmer, Georg Vesely, Karel ASR Automatic Speech Recognition GMM HTK SGMM Idiap-RR-04-2012 2012 Idiap Rue Marconi 19, Martigny January 2012 We describe the design of Kaldi, a free, open-source toolkit for speech recognition research. Kaldi provides a speech recognition system based on finite-state automata (using the freely available OpenFst), together with detailed documentation and a comprehensive set of scripts for building complete recognition systems. Kaldi is written is C++, and the core library supports modeling of arbitrary phonetic-context sizes, acoustic modeling with subspace Gaussian mixture models (SGMM) as well as standard Gaussian mixture models, together with all commonly used linear and affine transforms. Kaldi is released under the Apache License v2.0, which is highly nonrestrictive, making it suitable for a wide community of users.