REPORT Poh_06_VR_theory_j_rr/IDIAP Towards Explaining the Success (Or Failure) of Fusion in Biometric Authentication Poh, Norman Bengio, Samy EXTERNAL http://publications.idiap.ch/attachments/reports/2005/rr05-43.pdf PUBLIC Idiap-RR-43-2005 2005 IDIAP Biometric authentication is a process of verifying an identity claim using a person's behavioral and physiological characteristics. Due to vulnerability of the system to environmental noise and variation caused by the user, fusion of several biometric-enabled systems is identified as a promising solution. In the literature, various fixed rules (e.g. min, max, median, mean) and trainable classifiers (e.g. linear combination of scores or weighted sum) are used to combine the scores of several base-systems. Despite many empirical experiments being reported in the literature, few works are targeted at studying a wide range of factors that can affect the fusion performance, but most undertook these factors in isolation. Some of these factors are: 1) dependency among features to be combined, 2) the choice of fusion classifier/operator, 3) the choice of decision threshold, 4) the relative base-system performance, 5) the presence of noise (or the degree of robustness of classifiers to noise,',','), and 6) the type of classifier output. To understand these factors, we propose to model Equal Error Rate (EER,',','), a commonly used performance measure in biometric authentication. Tackling factors 1--5 implies that the use of class conditional Gaussian distribution is imperative, at least to begin with. When the class conditional scores (client or impostor) to be combined are based on a multivariate Gaussian, factors 1, 3, 4 and 5 can be readily modeled. The challenge now lies in establishing the missing link between EER and the fusion classifier mentioned above. Based on the EER framework, we can even derive such missing link with non-linear fusion classifiers, a proposal that, to the best of our knowledge, has not been investigated before. The principal difference between the theoretical EER model proposed here and previous studies in this direction is that scores are considered log-likelihood ratios (of client versus impostor) and the decision threshold is considered a prior (or log-prior ratio). In the previous studies, scores are considered posterior probabilities whereby the role of adjustable threshold as a prior adjustment parameter is somewhat less emphasized. When the EER models (especially those on Order Statistics) cannot address adequately factors 1 and 4, we rely on simulations, which are relatively easy to carry out and whose results can be interpreted more easily. There are however several issues left untreated in the EER models, namely 1) what if the scores are known to be not approximately normally distributed (for instance those due to Multi-Layer Perceptron outputs); 2) what if scores among classifiers to be combined are not comparable in range (their distributions are different from each other); 3) how to evaluate the performance measure other than EER. For issue 1, we propose to reverse the process of the squashing function such that the data (scores) is once again approximately normal. For issue 2, some score normalization procedures are also proposed, namely F-ratio normalization (F-Norm) and margin normalization. F-Norm has the advantage that scores are made comparable while the relative shape of the distribution remains the same. Margin normalization has the advantage that no free parameter is required and such transformation relies entirely on the class conditional scores. Finally, although the Gaussian assumption is central to this work, we show that it is possible to remove such assumption by modeling the scores to be combined with a mixture of Gaussians. Some 1186 BANCA experiments verify that such approach can estimate the system performance better than using the Gaussian assumption.