REPORT Morais_Idiap-Com-01-2023/IDIAP A Bayesian approach to machine learning model comparison Morais, Antonio Anjos, André Ed. Odobez, Jean-Marc Ed. EXTERNAL http://publications.idiap.ch/attachments/reports/2022/Morais_Idiap-Com-01-2023.pdf PUBLIC Idiap-Com-01-2023 2023 Idiap February 2023 Performance measures are an important component of machine learning algorithms. They are useful when it comes to evaluate the quality of a model, but also to help the algorithm improve itself. Every need has its own metric. However, when we have a small data set, these measures don’t express properly the performance of the model. That’s when confidence intervals and credible regions come in handy. Expressing the performance measures in a probabilistic setting lets us develop them as distributions. Then we can use those distributions to establish credible regions. In the first instance we will address the precision, recall and F1-score followed by the accuracy, specificity and Jaccard index. We will study the coverage of the credible regions computed through the posterior distributions. Then we will discuss ROC curve, precision-recall curve and k-fold cross-validation. Finally we will conclude with a small discussion about what we could do with dependent samples.