CONF
dines:interspeech-icslp:2006/IDIAP
The segmentation of multi-channel meeting recordings for automatic speech recognition
Dines, John
Vepa, Jithendra
Hain, Thomas
EXTERNAL
https://publications.idiap.ch/attachments/papers/2006/dines-interspeech-icslp-2006.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/dines:rr06-22
Related documents
Int. Conf. on Spoken Language Processing (Interspeech ICSLP)
2006
Pittsburgh, USA
1213-1216
IDIAP-RR 06-22
One major research challenge in the domain of the analysis of meeting room data is the automatic transcription of what is spoken during meetings, a task which has gained considerable attention within the ASR research community through the NIST rich transcription evaluations conducted over the last three years. One of the major difficulties in carrying out automatic speech recognition (ASR) on this data is dealing with the challenging recording environment, which has instigated the development of novel audio pre-processing approaches. In this paper we present a system for the automatic segmentation of multiple-channel individual headset microphone (IHM) meeting recordings for automatic speech recognition. The system relies on an MLP classifier trained from several meeting room corpora to identify speech/non-speech segments of the recordings. We give a detailed analysis of the segmentation performance for a number of system configurations, with our best system achieving ASR performance on automatically generated segments within 1.3\% (3.7\% relative) of a manual segmentation of the data.
REPORT
dines:rr06-22/IDIAP
The segmentation of multi-channel meeting recordings for automatic speech recognition
Dines, John
Vepa, Jithendra
Hain, Thomas
EXTERNAL
https://publications.idiap.ch/attachments/reports/2006/dines-idiap-rr-06-22.pdf
PUBLIC
Idiap-RR-22-2006
2006
IDIAP
Published in Interspeech-ICSLP 2006, Pittsburgh
One major research challenge in the domain of the analysis of meeting room data is the automatic transcription of what is spoken during meetings, a task which has gained considerable attention within the ASR research community through the NIST rich transcription evaluations conducted over the last three years. One of the major difficulties in carrying out automatic speech recognition (ASR) on this data is dealing with the challenging recording environment, which has instigated the development of novel audio pre-processing approaches. In this paper we present a system for the automatic segmentation of multiple-channel individual headset microphone (IHM) meeting recordings for automatic speech recognition. The system relies on an MLP classifier trained from several meeting room corpora to identify speech/non-speech segments of the recordings. We give a detailed analysis of the segmentation performance for a number of system configurations, with our best system achieving ASR performance on automatically generated segments within 1.3\% (3.7\% relative) of a manual segmentation of the data.