CONF
Luo_NIPS10_2010/IDIAP
Learning from Candidate Labeling Sets
Luo, Jie
Orabona, Francesco
EXTERNAL
https://publications.idiap.ch/attachments/papers/2011/Luo_NIPS10_2010.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/Luo_Idiap-RR-27-2011
Related documents
NIPS Foundation - Advances in Neural Information Processing Systems 23 (NIPS10)
Vancouver, B.C., Canada
23
2010
MIT Press
December 2010
In many real world applications we do not have access to fully-labeled training data, but only to a list of possible labels. This is the case, e.g., when learning visual classifiers from images downloaded from the web, using just their text captions or tags as learning oracles. In general, these problems can be very difficult. However most of the time there exist different implicit sources of information, coming from the relations between instances and labels, which are usually dismissed. In this paper, we propose a semi-supervised framework to model this kind of problems. Each training sample is a bag containing multi-instances, associated with a set of candidate labeling vectors. Each labeling vector encodes the possible labels for the instances in the bag, with only one being fully correct. The use of the labeling vectors provides a principled way not to exclude any information. We propose a large margin discriminative formulation, and an efficient algorithm to solve it. Experiments conducted on artificial datasets and a real-world images and captions dataset show that our approach achieves performance comparable to an SVM trained with the ground-truth labels, and outperforms other baselines.
REPORT
Luo_Idiap-RR-27-2011/IDIAP
Learning from Candidate Labeling Sets
Luo, Jie
Orabona, Francesco
EXTERNAL
https://publications.idiap.ch/attachments/reports/2010/Luo_Idiap-RR-27-2011.pdf
PUBLIC
Idiap-RR-27-2011
2011
Idiap
August 2011
In many real world applications we do not have access to fully-labeled training data, but only to a list of possible labels. This is the case, e.g., when learning visual classifiers from images downloaded from the web, using just their text captions or tags as learning oracles. In general, these problems can be very difficult. However most of the time there exist different implicit sources of information, coming from the relations between instances and labels, which are usually dismissed. In this paper, we propose a semi-supervised framework to model this kind of problems. Each training sample is a bag containing multi-instances, associated with a set of candidate labeling vectors. Each labeling vector encodes the possible labels for the instances in the bag, with only one being fully correct. The use of the labeling vectors provides a principled way not to exclude any information. We propose a large margin discriminative formulation, and an efficient algorithm to solve it. Experiments conducted on artificial datasets and two images and captions datasets show that our approach achieves performance comparable to SVM trained with the ground-truth labels, and outperforms other baselines.