logo Idiap Research Institute        
 [BibTeX] [Marc21]
Location Based Speaker Segmentation
Type of publication: Idiap-RR
Citation: lathoud-rr-02-43
Number: Idiap-RR-43-2002
Year: 2002
Institution: IDIAP
Address: Martigny, Switzerland
Note: Published in Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-03)
Abstract: This paper proposes a technique that segments into speaker turns based on their location, essentially implementing a discrete source tracking system. In many multi-party conversations, such as meetings or teleconferences, the location of participants is restricted to a small number of regions, such as seats around a table. In such cases, segmentation according to these discrete regions would be a reliable means of determining speaker turns. We propose a system that uses microphone pair time delays as features to represent speaker locations. A GMM/HMM framework is used to determine an optimal segmentation of the audio according to these locations. We also demonstrate how this approach is easily extended to more complex cases, such as the presence of two simultaneous speakers. Experiments testing the system on real recordings from a meeting room show that the proposed location features can provide greater discrimination than standard cepstral features, and also demonstrate the success of the extension to handle dual-speaker overlap.
Userfields: ipdinar={2002}, ipdmembership={speech}, language={English},
Keywords:
Projects Idiap
Authors Lathoud, Guillaume
McCowan, Iain A.
Crossref by lathoud03a
Added by: [UNK]
Total mark: 0
Attachments
  • rr-02-43.pdf
  • rr-02-43.ps.gz
Notes