Tampered Speaker Inconsistency Detection with Phonetically Aware Audio-visual Features

Type of publication:	Conference paper
Citation:	Korshunov_AVFAKES_ICML_2019
Publication status:	Published
Booktitle:	International Conference on Machine Learning
Series:	Synthetic Realities: Deep Learning for Detecting AudioVisual Fakes
Year:	2019
Month:	July
Note:	Best paper award in ICML workshop "Synthetic Realities: Deep Learning for Detecting AudioVisual Fakes"
Abstract:	The recent increase in social media based propaganda, i.e., ‘fake news’, calls for automated methods to detect tampered content. In this paper, we focus on detecting tampering in a video with a person speaking to a camera. This form of manipulation is easy to perform, since one can just replace a part of the audio, dramatically chang- ing the meaning of the video. We consider several detection approaches based on phonetic features and recurrent networks. We demonstrate that by replacing standard MFCC features with embeddings from a DNN trained for automatic speech recognition, combined with mouth landmarks (visual features), we can achieve a significant performance improvement on several challenging publicly available databases of speakers (VidTIMIT, AMI, and GRID), for which we generated sets of tampered data. The evaluations demonstrate a relative equal error rate reduction of 55% (to 4.5% from 10.0%) on the large GRID corpus based dataset and a satisfying generalization of the model on other datasets.
Keywords:	inconsistencies detection, lip-syncing, Video tampering
Projects:	Idiap SAVI
Authors:	Korshunov, Pavel Halstead, Michael Castan, Diego Graciarena, Martin McLaren, Mitchell Burns, Brian Lawson, Aaron Marcel, Sébastien
Added by:	[UNK]
Total mark:	0
Attachments
Korshunov_AVFAKESICML_2019.pdf
Notes

processing time: 0.0003 seconds.