REPORT astrid-00-22a/IDIAP Using Multiple Time Scales in the Framework of Multi-Stream Speech Recognition Hagen, Astrid Bourlard, Hervé difference features full combination HMM/ANN-Hybrid multi-stream multiple time scales EXTERNAL http://publications.idiap.ch/attachments/reports/2000/rr00-22.pdf PUBLIC Idiap-RR-22-2000 2000 IDIAP Martigny, Switzerland July 2000 Published: ICSLP 2000, Beijing, September 2000 In this paper, we present a new approach to incorporating multiple time scale information as independent streams in multi-stream processing. To illustrate the procedure, we take two different sets of multiple time scale features. In the first system, these are features extracted over variable sized windows of three and five times the original window size. In the second system, we take as separate input streams the commonly used difference features, i.e. the first and second order derivatives of the instantaneous features. In the same way, any other kinds of multiple time scale features could be employed. The approach is embedded in the recently introduced ``full combination'' approach to multi-stream processing in which, the phoneme probabilities from all possible combinations of streams are combined in a weighted sum. As an extension of this approach we have found that replacing the sum of probabilities by their product, in the same ``all wise'' context, can result in higher robustness. Capturing different information in each stream, and with the longer time scale features being more robust to noise, the multiple time scale multi-stream system gained a significant performance improvement in both clean speech and in real-environmental noise.

<subfield code="a">REPORT</subfield>

</datafield>

<subfield code="a">astrid-00-22a/IDIAP</subfield>

</datafield>

<subfield code="a">Using Multiple Time Scales in the Framework of Multi-Stream Speech Recognition</subfield>

</datafield>

<subfield code="a">Hagen, Astrid</subfield>

</datafield>

<subfield code="a">Bourlard, Hervé</subfield>

</datafield>

<subfield code="a">difference features</subfield>

</datafield>

<subfield code="a">full combination</subfield>

</datafield>

<subfield code="a">HMM/ANN-Hybrid</subfield>

</datafield>

<subfield code="a">multi-stream</subfield>

</datafield>

<subfield code="a">multiple time scales</subfield>

</datafield>

<subfield code="i">EXTERNAL</subfield>

<subfield code="u">http://publications.idiap.ch/attachments/reports/2000/rr00-22.pdf</subfield>

<subfield code="x">PUBLIC</subfield>

</datafield>

<subfield code="a">Idiap-RR-22-2000</subfield>

</datafield>

<subfield code="b">IDIAP</subfield>

<subfield code="a">Martigny, Switzerland</subfield>

</datafield>

</datafield>

<subfield code="a">Published: ICSLP 2000, Beijing, September 2000</subfield>

</datafield>

<subfield code="a">In this paper, we present a new approach to incorporating multiple time scale information as independent streams in multi-stream processing. To illustrate the procedure, we take two different sets of multiple time scale features. In the first system, these are features extracted over variable sized windows of three and five times the original window size. In the second system, we take as separate input streams the commonly used difference features, i.e. the first and second order derivatives of the instantaneous features. In the same way, any other kinds of multiple time scale features could be employed. The approach is embedded in the recently introduced ``full combination'' approach to multi-stream processing in which, the phoneme probabilities from all possible combinations of streams are combined in a weighted sum. As an extension of this approach we have found that replacing the sum of probabilities by their product, in the same ``all wise'' context, can result in higher robustness. Capturing different information in each stream, and with the longer time scale features being more robust to noise, the multiple time scale multi-stream system gained a significant performance improvement in both clean speech and in real-environmental noise.</subfield>

</datafield>

</record>

</collection>