Update cookies preferences
 logo Idiap Research Institute        
 [BibTeX] [Marc21]
Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks
Type of publication: Conference paper
Citation: Unnervik_NEURIPS_2024
Publication status: Accepted
Booktitle: NeurIPS Safe Generative AI Workshop 2024
Year: 2024
Crossref: https://arxiv.org/abs/2402.18718
Abstract: Backdoor attacks allow an attacker to embed a specific vulnerability in a machine learning algorithm, activated when an attacker-chosen pattern is presented, causing a specific misprediction. The need to identify backdoors in biometric scenarios has led us to propose a novel technique with different trade-offs. In this paper we propose to use model pairs on open-set classification tasks for detecting backdoors. Using a simple linear operation to project embeddings from a probe model's embedding space to a reference model's embedding space, we can compare both embeddings and compute a similarity score. We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures, having been trained independently and on different datasets. This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature. Additionally, we show that backdoors can be detected even when both models are backdoored. The source code is made available for reproducibility purposes.
Keywords:
Projects TRESPASS-ETN
Authors Unnervik, Alexander
Otroshi Shahreza, Hatef
George, Anjith
Marcel, Sébastien
Added by: [UNK]
Total mark: 0
Attachments
  • Unnervik_NEURIPS_2024.pdf
Notes