CONF
vivek-rr-03-32b/IDIAP
On Factorizing Spectral Dynamics for Robust Speech Recognition
Tyagi, Vivek
McCowan, Iain A.
Bourlard, Hervé
Misra, Hemant
EXTERNAL
https://publications.idiap.ch/attachments/reports/2003/mcms_rr.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/vivek-rr-03-32
Related documents
Eurospeech
2003
IDIAP RR 03-32
In this paper, we introduce new dynamic speech features based on the modulation spectrum. These features, termed Mel-cepstrum Modulation Spectrum (MCMS,',','),
map the time trajectories of the spectral dynamics into a series of slow and fast moving orthogonal components, providing a more general and discriminative range of dynamic features than traditional delta and acceleration features. The features can be seen as the outputs of an array of band-pass filters spread over the cepstral modulation frequency range of interest. In experiments, it is shown that, as well as providing a slight improvement in clean conditions, these new dynamic features yield a significant increase in speech recognition performance in various noise conditions when compared directly to the standard temporal derivative features and RASTA-PLP features.
REPORT
vivek-rr-03-32/IDIAP
On Factorizing Spectral Dynamics for Robust Speech Recognition
Tyagi, Vivek
McCowan, Iain A.
Bourlard, Hervé
Misra, Hemant
EXTERNAL
https://publications.idiap.ch/attachments/reports/2003/mcms_rr.pdf
PUBLIC
Idiap-RR-32-2003
2003
IDIAP
in proceedings of Eurospeech 2003
In this paper, we introduce new dynamic speech features based on the modulation spectrum. These features, termed Mel-cepstrum Modulation Spectrum (MCMS,',','),
map the time trajectories of the spectral dynamics into a series of slow and fast moving orthogonal components, providing a more general and discriminative range of dynamic features than traditional delta and acceleration features. The features can be seen as the outputs of an array of band-pass filters spread over the cepstral modulation frequency range of interest. In experiments, it is shown that, as well as providing a slight improvement in clean conditions, these new dynamic features yield a significant increase in speech recognition performance in various noise conditions when compared directly to the standard temporal derivative features and RASTA-PLP features.