Vrpsofc:A framework for focused crawler using mutation improving particle swarm optimization algorithm

Guangxia Xu, Peng Jiang, Chuang Ma, Mahmoud Daneshmand, Shaoci Xie

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

The focused crawler is the key technology of the search engine. It filters webpages based on relevant algorithms until certain conditions are met. The current focused crawler is prone to topic-drift and low precision in the process of crawling the webpages. Therefore, this paper proposes a focused crawler framework (VRPSOFC) based on mutation improving particle swarm optimization. First of all, for each topic, VRPSOFC gets 3 different types of seed pages that are easy to generate large-scale web page aggregation based on the page click rate of Google search, which are official website, wikipedia, forum or video page. Then VRPSOFC uses the mutation improved particle swarm optimization algorithm proposed in this paper to crawl webpages, where each seed page will be used as the initial page. Finally, experiment in the real web environment and analyze the results. Compared with traditional VSM and other methods, VRPSOFC can obtain more accurate URL priority and crawl high quality web pages. Therefore, the topic crawler framework proposed in this paper is effective and important.

Original languageEnglish
Title of host publicationProceedings of the ACM Turing Celebration Conference - China, ACM TURC 2019
ISBN (Electronic)9781450371582
DOIs
StatePublished - 17 May 2019
Event2019 ACM Turing Celebration Conference - China, ACM TURC 2019 - Chengdu, China
Duration: 17 May 201919 May 2019

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2019 ACM Turing Celebration Conference - China, ACM TURC 2019
Country/TerritoryChina
CityChengdu
Period17/05/1919/05/19

Keywords

  • Focused crawler
  • Mutation
  • Particle swarm algorithm
  • Precision
  • Topic-drift

Fingerprint

Dive into the research topics of 'Vrpsofc:A framework for focused crawler using mutation improving particle swarm optimization algorithm'. Together they form a unique fingerprint.

Cite this