CONF
gatica05a-conf/IDIAP
Detecting Group Interest-level in Meetings
Gatica-Perez, Daniel
McCowan, Iain A.
Zhang, Dong
Bengio, Samy
EXTERNAL
https://publications.idiap.ch/attachments/reports/2004/gatica-rr-04-51.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/gatica-rr-04-51
Related documents
IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)
2005
Finding relevant segments in meeting recordings is important for summarization, browsing, and retrieval purposes. In this paper, we define relevance as the interest-level that meeting participants manifest as a group during the course of their interaction (as perceived by an external observer,',','),
and investigate the automatic detection of segments of high-interest from audio-visual cues. This is motivated by the assumption that there is a relationship between segments of interest to participants, and those of interest to the end user, e.g. of a meeting browser. We first address the problem of human annotation of group interest-level. On a 50-meeting corpus, recorded in a room equipped with multiple cameras and microphones, we found that the annotations generated by multiple people exhibit a good degree of consistency, providing a stable ground-truth for automatic methods. For the automatic detection of high-interest segments, we investigate a methodology based on Hidden Markov Models (HMMs) and a number of audio and visual features. Single- and multi-stream approaches were studied. Using precision and recall as performance measures, the results suggest that the automatic detection of group interest-level is promising, and that while audio in general constitutes the predominant modality in meetings, the use of a multi-modal approach is beneficial.
REPORT
gatica-rr-04-51/IDIAP
Detecting Group Interest-level in Meetings
Gatica-Perez, Daniel
McCowan, Iain A.
Zhang, Dong
Bengio, Samy
EXTERNAL
https://publications.idiap.ch/attachments/reports/2004/gatica-rr-04-51.pdf
PUBLIC
Idiap-RR-51-2004
2004
IDIAP
Martigny, Switzerland
Submitted for publication.
Finding relevant segments in meeting recordings is important for summarization, browsing, and retrieval purposes. In this paper, we define relevance as the interest-level that meeting participants manifest as a group during the course of their interaction (as perceived by an external observer,',','),
and investigate the automatic detection of segments of high-interest from audio-visual cues. This is motivated by the assumption that there is a relationship between segments of interest to participants, and those of interest to the end user, e.g. of a meeting browser. We first address the problem of human annotation of group interest-level. On a 50-meeting corpus, recorded in a room equipped with multiple cameras and microphones, we found that the annotations generated by multiple people exhibit a good degree of consistency, providing a stable ground-truth for automatic methods. For the automatic detection of high-interest segments, we investigate a methodology based on Hidden Markov Models (HMMs) and a number of audio and visual features. Single- and multi-stream approaches were studied. Using precision and recall as performance measures, the results suggest that (i) the automatic detection of group interest-level is promising, and (ii) while audio in general constitutes the predominant modality in meetings, the use of a multi-modal approach is beneficial.