CONF
More-Hert-Mayo99b/IDIAP
Data binarization by discriminant elimination
Moreira, Miguel
Hertz, Alain
Mayoraz, Eddy
Bruha, Ivan
Ed.
Bohanec, Marco
Ed.
EXTERNAL
https://publications.idiap.ch/attachments/reports/1999/rr99-04.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/more-hert-mayo99
Related documents
Proceedings of the ICML-99 Workshop: From Machine Learning to Knowledge Discovery in Databases
1999
51-60
IDIAP-RR 99-04
This paper is concerned with the problem of constructing a mapping from an arbitrary input space $\Input$ into a binary output space $\Bin^\BinDim$, based on a given data set $\DataSet \subset \Input$ partitioned into classes. The aim is to reduce the total amount of information, while keeping the most relevant of it for the partitioning. An additional constraint to our problem is that the mapping must have a simple interpretation. Thus, each of the $\BinDim$ discriminants is related to one original attribute (e.g. linear combinations of original attributes are not admitted). Beyond data compression, the targeted application is a preprocessing for classification techniques that require Boolean input data. While other existing techniques for this problem are constructive (increasing $\BinDim$ iteratively, such as decision trees,',','),
the method proposed here proceeds by starting with a very large dimension $\BinDim$, and by reducing it iteratively.
REPORT
More-Hert-Mayo99/IDIAP
Data binarization by discriminant elimination
Moreira, Miguel
Hertz, Alain
Mayoraz, Eddy
EXTERNAL
https://publications.idiap.ch/attachments/reports/1999/rr99-04.pdf
PUBLIC
Idiap-RR-04-1999
1999
IDIAP
This paper is concerned with the problem of constructing a mapping from an arbitrary input space $\Input$ into a binary output space $\Bin^\BinDim$, based on a given data set $\DataSet \subset \Input$ partitioned into classes. The aim is to reduce the total amount of information, while keeping the most relevant of it for the partitioning. An additional constraint to our problem is that the mapping must have a simple interpretation. Thus, each of the $\BinDim$ discriminants is related to one original attribute (e.g. linear combinations of original attributes are not admitted). Beyond data compression, the targeted application is a preprocessing for classification techniques that require Boolean input data. While other existing techniques for this problem are constructive (increasing $\BinDim$ iteratively, such as decision trees,',','),
the method proposed here proceeds by starting with a very large dimension $\BinDim$, and by reducing it iteratively.