Learning from Images with Captions Using the Maximum Margin Set Algorithm

Type of publication:	Idiap-RR
Citation:	Luo_Idiap-RR-30-2011
Number:	Idiap-RR-30-2011
Year:	2011
Month:	8
Institution:	Idiap
Abstract:	A large amount of images with accompanying text captions are available on the Internet. These are valuable for training visual classifiers without any explicit manual intervention. In this paper, we present a general framework to address this problem. Under this new framework, each training image is represented as a bag of regions, associated with a set of candidate labeling vectors. Each labeling vector encodes the possible labels for the regions of the image. The set of all possible labeling vectors can be generated automatically from the caption using natural language processing techniques. The use of labeling vectors provides a principled way to include diverse information from the captions, such as multiple types of words corresponding to different attributes of the same image region, labeling constraints derived from grammatical connections between words, uniqueness constraints, and spatial position indicators. Moreover, it can also be used to incorporate high-level domain knowledge useful for improving learning performance. We show that learning is possible under this weakly supervised setup. Exploiting this property of the problem, we propose a large margin discriminative formulation, and an efficient algorithm to solve the proposed learning problem. Experiments conducted on artificial datasets and two real-world images and captions datasets support our claims.
Keywords:
Projects:	Idiap
Authors:	Luo, Jie Orabona, Francesco Caputo, Barbara Ferrari, Vittorio
Added by:	[ADM]
Total mark:	0
Attachments
Luo_Idiap-RR-30-2011.pdf (MD5: 3e02c8efccb5b21327e63593966aabbd)
Notes

processing time: 0.0003 seconds.