Towards directly modeling raw speech signal for speaker verification using CNNs

Type of publication:	Conference paper
Citation:	Muckenhirn_ICASSP_2018
Publication status:	Published
Booktitle:	IEEE International Conference on Acoustics, Speech and Signal Processing
Year:	2018
Month:	April
Pages:	4884-4888
Location:	Calgary, CANADA
ISBN:	978-1-5386-4658-8
Crossref:	Muckenhirn_Idiap-RR-30-2017: Towards directly modeling raw speech signal for speaker verification using CNNs, Muckenhirn, Hannah, Magimai-Doss, Mathew and Marcel, Sébastien, Idiap-RR-30-2017
Abstract:	Speaker verification systems traditionally extract and model cepstral features or filter bank energies from the speech signal. In this paper, inspired by the success of neural network-based approaches to model directly raw speech signal for applications such as speech recognition, emotion recognition and anti-spoofing, we propose a speaker verification approach where speaker discriminative information is directly learned from the speech signal by: (a) first training a CNN-based speaker identification system that takes as input raw speech signal and learns to classify on speakers (unknown to the speaker verification system); and then (b) building a speaker detector for each speaker in the speaker verification system by replacing the output layer of the speaker identification system by two outputs (genuine, impostor), and adapting the system in a discriminative manner with enrollment speech of the speaker and impostor speech data. Our investigations on the Voxforge database shows that this approach can yield systems competitive to state-of-the-art systems. An analysis of the filters in the first convolution layer shows that the filters give emphasis to information in low frequency regions (below 1000 Hz) and implicitly learn to model fundamental frequency information in the speech signal for speaker discrimination.
Keywords:	Convolutional neural network, End-to-end learning, Fundamental frequency, recognition, speaker verification
Projects	Idiap UNITS
Authors	Muckenhirn, Hannah Magimai-Doss, Mathew Marcel, Sébastien
Added by:	[UNK]
Total mark:	0
Attachments
Muckenhirn_ICASSP_2018.pdf
Notes

processing time: 0.0002 seconds.