CONF
Taghizadeh_HSCMA_2011/IDIAP
An Integrated Framework for Multi-Channel Multi-Source Localization and Voice Activity Detection
Taghizadeh, Mohammad J.
Garner, Philip N.
Bourlard, Hervé
Abutalebi, Hamid Reza
Asaei, Afsaneh
EXTERNAL
https://publications.idiap.ch/attachments/papers/2011/Taghizadeh_HSCMA_2011.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/Taghizadeh_Idiap-RR-16-2011
Related documents
The Third Joint Workshop on Hands-free Speech Communication and Microphone Arrays
2011
Two of the major challenges in microphone array based adaptive beamforming, speech enhancement and distant speech recognition, are robust and accurate source localization and voice activity detection. This paper introduces a spatial gradient steered response power using the phase transform (SRP-PHAT) method which is capable of localization of competing speakers in overlapping conditions. We further investigate the behaviour of the SRP function and characterize theoretically a fixed point in its search space for the diffuse noise field. We call this fixed point the null position in the SRP search space. Building on this evidence, we propose a technique for multi- channel voice activity detection (MVAD) based on detection of a maximum power corresponding to the null position. The gradient SRP-PHAT in tandem with the MVAD form an integrated framework of multi-source localization and voice activity detection. The experiments carried out on real data recordings show that this framework is very effective in practical applications of hands-free communication.
REPORT
Taghizadeh_Idiap-RR-16-2011/IDIAP
AN INTEGRATED FRAMEWORK FOR MULTI-CHANNEL MULTI-SOURCE LOCALIZATION AND VOICE ACTIVITY DETECTION
Taghizadeh, Mohammad J.
Garner, Philip N.
Bourlard, Hervé
Abutalebi, Hamid Reza
Asaei, Afsaneh
EXTERNAL
https://publications.idiap.ch/attachments/reports/2011/Taghizadeh_Idiap-RR-16-2011.pdf
PUBLIC
Idiap-RR-16-2011
2011
Idiap
June 2011
Two of the major challenges in microphone array based adap-
tive beamforming, speech enhancement and distant speech
recognition, are robust and accurate source localization and
voice activity detection. This paper introduces a spatial gra-
dient steered response power using the phase transform (SRP-
PHAT) method which is capable of localization of competing
speakers in overlapping conditions. We further investigate the
behavior of the SRP function and characterize theoretically a
fixed point in its search space for the diffuse noise field. We
call this fixed point the null position in the SRP search space.
Building on this evidence, we propose a technique for multi-
channel voice activity detection (MVAD) based on detection
of a maximum power corresponding to the null position. The
gradient SRP-PHAT in tandem with the MVAD form an inte-
grated framework of multi-source localization and voice ac-
tivity detection. The experiments carried out on real data
recordings show that this framework is very effective in prac-
tical applications of hands-free communication.