CONF
misr04/IDIAP
Spectral Entropy Based Feature for Robust ASR
Misra, Hemant
Ikbal, Shajith
Bourlard, Hervé
Hermansky, Hynek
EXTERNAL
https://publications.idiap.ch/attachments/reports/2003/rr03-56.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/misra-rr-03-56
Related documents
Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
2004
Montreal, Canada
May 2004
IDIAP-RR 2003 56
In general, entropy gives us a measure of the number of bits required to represent some information. When applied to probability mass function (PMF,',','),
entropy can also be used to measure the ``peakiness'' of a distribution. In this paper, we propose using the entropy of short time Fourier transform spectrum, normalised as PMF, as an additional feature for automatic speech recognition (ASR). It is indeed expected that a peaky spectrum, representation of clear formant structure in the case of voiced sounds, will have low entropy, while a flatter spectrum corresponding to non-speech or noisy regions will have higher entropy. Extending this reasoning further, we introduce the idea of multi-band/multi-resolution entropy feature where we divide the spectrum into equal size sub-bands and compute entropy in each sub-band. The results presented in this paper show that multi-band entropy features used in conjunction with normal cepstral features improve the performance of ASR system.
REPORT
misra-rr-03-56/IDIAP
Spectral Entropy Based Feature for Robust ASR
Misra, Hemant
Ikbal, Shajith
Bourlard, Hervé
Hermansky, Hynek
EXTERNAL
https://publications.idiap.ch/attachments/reports/2003/rr03-56.pdf
PUBLIC
Idiap-RR-56-2003
2003
IDIAP
Martigny, Switzerland
in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing {(ICASSP)}, 2004
In general, entropy gives us a measure of the number of bits required to represent some information. When applied to probability mass function (PMF,',','),
entropy can also be used to measure the ``peakiness'' of a distribution. In this paper, we propose using the entropy of short time Fourier transform spectrum, normalised as PMF, as an additional feature for automatic speech recognition (ASR). It is indeed expected that a peaky spectrum, representation of clear formant structure in the case of voiced sounds, will have low entropy, while a flatter spectrum corresponding to non-speech or noisy regions will have higher entropy. Extending this reasoning further, we introduce the idea of multi-band/multi-resolution entropy feature where we divide the spectrum into equal size sub-bands and compute entropy in each sub-band. The results presented in this paper show that multi-band entropy features used in conjunction with normal cepstral features improve the performance of ASR system.