logo Idiap Research Institute        
 [BibTeX] [Marc21]
An Agent-Based Focused Crawling Framework for Topic- and Genre-Related Web Document Discovery
Type of publication: Conference paper
Citation: Pappas_ICTAI_2012
Publication status: Published
Booktitle: 24th IEEE International Conference on Tools with Artificial Intelligence
Year: 2012
Month: August
Publisher: IEEE
Location: Athens, Greece
URL: http://ieeexplore.ieee.org/xpl...
Abstract: The discovery of web documents about certain topics is an important task for web-based applications including web document retrieval, opinion mining and knowledge extraction. In this paper, we propose an agent-based focused crawling framework able to retrieve topic- and genre-related web documents. Starting from a simple topic query, a set of focused crawler agents explore in parallel topic-specific web paths using dynamic seed URLs that belong to certain web genres and are collected from web search engines. The agents make use of an internal mechanism that weighs topic and genre relevance scores of unvisited web pages. They are able to adapt to the properties of a given topic by modifying their internal knowledge during search, handle ambiguous queries, ignore irrelevant pages with respect to the topic and retrieve collaboratively topic-relevant web pages. We performed an experimental study to evaluate the behavior of the agents for a variety of topic queries demonstrating the benefits and the capabilities of our framework.
Projects Idiap
Authors Pappas, Nikolaos
Katsimpras, Georgios
Stamatatos, Efstathios
Added by: [UNK]
Total mark: 0
  • Pappas_ICTAI_2012.pdf