CONF Imseng_ICASSP_2014/IDIAP Exploiting un-transcribed foreign data for speech recognition in well-resourced languages Imseng, David Potard, Blaise Motlicek, Petr Nanchen, Alexandre Bourlard, Hervé EXTERNAL http://publications.idiap.ch/attachments/papers/2014/Imseng_ICASSP_2014.pdf PUBLIC Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing Florence, IT 2014 IEEE 2322 - 2326 1520-6149 10.1109/ICASSP.2014.6854014 doi Manual transcription of audio databases for automatic speech recognition (ASR) training is a costly and time-consuming process. State-of-the-art hybrid ASR systems that are based on deep neural networks (DNN) can exploit un-transcribed foreign data during unsupervised DNN pre-training or semi-supervised DNN training. We investigate the relevance of foreign data characteristics, in particular domain and language. Using three different datasets of the MediaParl and Ester databases, our experiments suggest that domain and language are equally important. Foreign data recorded under matched conditions (language and domain) yields the most improvement. The resulting ASR system yields about 5% relative improvement compared to the baseline system only trained on transcribed data. Our studies also reveal that the amount of foreign data used for semi-supervised training can be significantly reduced without degrading the ASR performance if confidence measure based data selection is employed.

</datafield>

<subfield code="a">Imseng_ICASSP_2014/IDIAP</subfield>

</datafield>

<subfield code="a">Exploiting un-transcribed foreign data for speech recognition in well-resourced languages</subfield>

</datafield>

<subfield code="a">Imseng, David</subfield>

</datafield>

<subfield code="a">Potard, Blaise</subfield>

</datafield>

<subfield code="a">Motlicek, Petr</subfield>

</datafield>

<subfield code="a">Nanchen, Alexandre</subfield>

</datafield>

<subfield code="a">Bourlard, Hervé</subfield>

</datafield>

<subfield code="i">EXTERNAL</subfield>

<subfield code="u">http://publications.idiap.ch/attachments/papers/2014/Imseng_ICASSP_2014.pdf</subfield>

<subfield code="x">PUBLIC</subfield>

</datafield>

<subfield code="a">Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing</subfield>

<subfield code="c">Florence, IT</subfield>

</datafield>

</datafield>

</datafield>

<subfield code="a">10.1109/ICASSP.2014.6854014</subfield>

</datafield>

<subfield code="a">Manual transcription of audio databases for automatic speech recognition (ASR) training is a costly and time-consuming process. State-of-the-art hybrid ASR systems that are based on deep neural networks (DNN) can exploit un-transcribed foreign data during unsupervised DNN pre-training or semi-supervised DNN training. We investigate the relevance of foreign data characteristics, in particular domain and language. Using three different datasets of the MediaParl and Ester databases, our experiments suggest that domain and language are equally important. Foreign data recorded under matched conditions (language and domain) yields the most improvement. The resulting ASR system yields about 5% relative improvement compared to the baseline system only trained on transcribed data. Our studies also reveal that the amount of foreign data used for semi-supervised training can be significantly reduced without degrading the ASR performance if confidence measure based data selection is employed.</subfield>

</datafield>

</record>

</collection>