CONF More-Hert-Mayo99b/IDIAP Data binarization by discriminant elimination Moreira, Miguel Hertz, Alain Mayoraz, Eddy Bruha, Ivan Ed. Bohanec, Marco Ed. EXTERNAL https://publications.idiap.ch/attachments/reports/1999/rr99-04.pdf PUBLIC https://publications.idiap.ch/index.php/publications/showcite/more-hert-mayo99 Related documents Proceedings of the ICML-99 Workshop: From Machine Learning to Knowledge Discovery in Databases 1999 51-60 IDIAP-RR 99-04 This paper is concerned with the problem of constructing a mapping from an arbitrary input space $\Input$ into a binary output space $\Bin^\BinDim$, based on a given data set $\DataSet \subset \Input$ partitioned into classes. The aim is to reduce the total amount of information, while keeping the most relevant of it for the partitioning. An additional constraint to our problem is that the mapping must have a simple interpretation. Thus, each of the $\BinDim$ discriminants is related to one original attribute (e.g. linear combinations of original attributes are not admitted). Beyond data compression, the targeted application is a preprocessing for classification techniques that require Boolean input data. While other existing techniques for this problem are constructive (increasing $\BinDim$ iteratively, such as decision trees,',','), the method proposed here proceeds by starting with a very large dimension $\BinDim$, and by reducing it iteratively. REPORT More-Hert-Mayo99/IDIAP Data binarization by discriminant elimination Moreira, Miguel Hertz, Alain Mayoraz, Eddy EXTERNAL https://publications.idiap.ch/attachments/reports/1999/rr99-04.pdf PUBLIC Idiap-RR-04-1999 1999 IDIAP This paper is concerned with the problem of constructing a mapping from an arbitrary input space $\Input$ into a binary output space $\Bin^\BinDim$, based on a given data set $\DataSet \subset \Input$ partitioned into classes. The aim is to reduce the total amount of information, while keeping the most relevant of it for the partitioning. An additional constraint to our problem is that the mapping must have a simple interpretation. Thus, each of the $\BinDim$ discriminants is related to one original attribute (e.g. linear combinations of original attributes are not admitted). Beyond data compression, the targeted application is a preprocessing for classification techniques that require Boolean input data. While other existing techniques for this problem are constructive (increasing $\BinDim$ iteratively, such as decision trees,',','), the method proposed here proceeds by starting with a very large dimension $\BinDim$, and by reducing it iteratively.