CONF Korshunov_AVFAKES_ICML_2019/IDIAP Tampered Speaker Inconsistency Detection with Phonetically Aware Audio-visual Features Korshunov, Pavel Halstead, Michael Castan, Diego Graciarena, Martin McLaren, Mitchell Burns, Brian Lawson, Aaron Marcel, Sébastien inconsistencies detection lip-syncing Video tampering EXTERNAL https://publications.idiap.ch/attachments/papers/2019/Korshunov_AVFAKESICML_2019.pdf PUBLIC International Conference on Machine Learning Synthetic Realities: Deep Learning for Detecting AudioVisual Fakes 2019 Best paper award in ICML workshop "Synthetic Realities: Deep Learning for Detecting AudioVisual Fakes" The recent increase in social media based propaganda, i.e., ‘fake news’, calls for automated methods to detect tampered content. In this paper, we focus on detecting tampering in a video with a person speaking to a camera. This form of manipulation is easy to perform, since one can just replace a part of the audio, dramatically chang- ing the meaning of the video. We consider several detection approaches based on phonetic features and recurrent networks. We demonstrate that by replacing standard MFCC features with embeddings from a DNN trained for automatic speech recognition, combined with mouth landmarks (visual features), we can achieve a significant performance improvement on several challenging publicly available databases of speakers (VidTIMIT, AMI, and GRID), for which we generated sets of tampered data. The evaluations demonstrate a relative equal error rate reduction of 55% (to 4.5% from 10.0%) on the large GRID corpus based dataset and a satisfying generalization of the model on other datasets.