Multi-scale sequential network for semantic text segmentation and localization

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Journal paper
Citation:	Villamizar_PRL_2020
Publication status:	Published
Journal:	Pattern Recognition Letters
Volume:	129
Year:	2020
Pages:	63-69
ISSN:	0167-8655
URL:	http://www.sciencedirect.com/s...
DOI:	https://doi.org/10.1016/j.patrec.2019.11.001
Abstract:	We present a novel method for semantic text document analysis which in addition to localizing text it labels the text in user-defined semantic categories. More precisely, it consists of a fully-convolutional and sequential network that we apply to the particular case of slide analysis to detect title, bullets and standard text. Our contributions are twofold: (1) A multi-scale network consisting of a series of stages that sequentially refine the prediction of text and semantic labels (text, title, bullet); (2) A synthetic database of slide images with text and semantic annotation that is used to train the network with abundant data and wide variability in text appearance, slide layouts, and noise such as compression artifacts. We evaluate our method on a collection of real slide images collected from multiple conferences, and show that it is able to localize text with an accuracy of 95%, and to classify titles and bullets with accuracies of 94% and 85% respectively. In addition, we show that our method is competitive on scene and born-digital image datasets, such as ICDAR 2011, where it achieves an accuracy of 91.1%.
Keywords:
Projects	Idiap
Authors	Villamizar, Michael Canévet, Olivier Odobez, Jean-Marc
Added by:	[UNK]
Total mark:	0
Attachments
Villamizar_PRL_2020.pdf
Notes

processing time: 0.0002 seconds.