ARTICLE
Rasipuram_SPEECHCOM_2015/IDIAP
Acoustic and Lexical Resource Constrained ASR using Language-Independent Acoustic Model and Language-Dependent Probabilistic Lexical Model
Rasipuram, Ramya
Magimai-Doss, Mathew
EXTERNAL
https://publications.idiap.ch/attachments/papers/2015/Rasipuram_SPEECHCOM_2015.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/Rasipuram_Idiap-RR-02-2014
Related documents
Speech Communication
68
23–40
2015
http://www.sciencedirect.com/science/article/pii/S0167639314000995
URL
doi:10.1016/j.specom.2014.12.006
doi
One of the key challenges involved in building statistical automatic speech recognition (ASR) systems is modeling the relationship between subword units or “lexical units” and acoustic feature observations. To model this relationship two types of resources are needed, namely, acoustic resources i.e., speech data with word level transcriptions and lexical resources where each word is transcribed in terms of subword units. Standard ASR systems typically use phonemes or phones as subword units. However, not all languages have well developed acoustic and phonetic lexical resources. In this paper, we show that the relationship between lexical units and acoustic features can be factored into two parts through a latent variable, namely, an acoustic model and a lexical model. In the acoustic model the relationship between latent variables and acoustic features is modeled, while in the lexical model a probabilistic relationship between latent variables and lexical units is modeled. We elucidate that in standard hidden
Markov model based ASR systems, the relationship between lexical units and latent variables is one-to-one and the lexical model is deterministic. Through a literature survey we show that this deterministic lexical modeling imposes the need for well developed acoustic and lexical resources from the target language or domain to build an ASR system. We then propose an approach that addresses
both acoustic and phonetic lexical resource constraints in ASR system development. In the proposed approach, latent variables are multilingual phones and lexical units are graphemes of the target language or domain. We show that the acoustic model can be trained on domain-independent or language-independent resources and the lexical model that models a probabilistic relationship between
graphemes and multilingual phones can be trained on a relatively small amount of transcribed speech data from the target domain or language. The potential and the efficacy of the proposed approach is demonstrated through experiments and comparisons with other approaches on three different ASR tasks: non-native and accented speech recognition, rapid development of an ASR system for a new
language, and development of an ASR system for a minority language.
REPORT
Rasipuram_Idiap-RR-02-2014/IDIAP
Acoustic and Lexical Resource Constrained ASR using Language-Independent Acoustic Model and Language-Dependent Probabilistic Lexical Model
Rasipuram, Ramya
Magimai-Doss, Mathew
Automatic Speech Recognition
grapheme
Kullback-Leibler divergence based hidden Markov model
Lexical modeling
Lexicon
phoneme
EXTERNAL
https://publications.idiap.ch/attachments/reports/2014/Rasipuram_Idiap-RR-02-2014.pdf
PUBLIC
Idiap-RR-02-2014
2014
Idiap
March 2014
One of the key challenge involved in building a statistical automatic speech recognition (ASR) system is modeling the relationship between lexical units (that are based on subword units in the pronunciation lexicon) and acoustic feature observations. To model this relationship two types of resources are needed, namely, acoustic resources (speech signals with word level transcriptions) and lexical resources (which transcribes each word in terms of subword units). Standard ASR systems typically use phonemes or phones as subword units. Not all languages have well developed acoustic resources and phonetic lexical resources. In this paper, we show that modeling of the relationship between lexical units and acoustic features can be factored into two parts through a latent variable, referred to as acoustic units, namely: (a) acoustic model that models the relationship between acoustic features and acoustic units and (b) lexical model that models the relationship between lexical units and acoustic units. Through this understanding, we elucidate that in standard hidden Markov model (HMM) based ASR system, the lexical model is deterministic (i.e., there exists an one-to-one relationship between lexical units and acoustic units), and it is the deterministic lexical model that imposes the need for well developed acoustic and lexical resources in the target language or domain when building ASR system. We then propose an approach that addresses both acoustic resource and lexical resource constraints. More specifically, in the proposed approach the acoustic model models the relationship between acoustic features and multilingual phones (acoustic units) on target language-independent data, and the lexical model models a probabilistic relationship between lexical units based on graphemes and multilingual phones on small amount of target language data. We show the potential and the efficacy of the proposed approach through experiments and comparisons with other approaches on three different ASR tasks, namely, non-native accented speech recognition, rapid development of ASR system for a new language and development of ASR system for a minority language.