Massive Fishing Website URL Parallel Filtering Method

Dongliang Xu, Jingchang Pan, Xiaojiang Du, Bailing Wang, Meng Liu, Qinma Kang

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

A randomized fingerprint model is proposed, which can effectively reduce the false positive rate by generating a unique fingerprint for each URL. The model is also used to improve the Wu and Manber (WM) algorithm, which is a multi-string matching algorithm; as a result, a randomized fingerprint WM (RFP-WM) algorithm is proposed. Furthermore, a Graphics Processing Unit (GPU)-based parallel randomized fingerprint algorithm (GRFP-WM) is implemented. Experimental results indicate that, for a massive pattern set containing more than a million URLs, the efficiency of the RFP-WM algorithm is 20% higher than that of the WM algorithm. The WM algorithm's efficiency is approximately 7% higher than that of the Aho and Corasick (AC) algorithm, which is also a multi-string matching algorithm. The efficiency and speedup of the GRFP-WM algorithm are higher than those of the GPU-based WM and the GPU-based AC algorithms. These results indicate that the randomized fingerprint model can effectively reduce the collision rate and improve the efficiency of the algorithm.

Original languageEnglish
Pages (from-to)2378-2388
Number of pages11
JournalIEEE Access
Volume6
DOIs
StatePublished - 12 Dec 2017

Keywords

  • GRFP-WM
  • URL filtering
  • randomized fingerprint model

Fingerprint

Dive into the research topics of 'Massive Fishing Website URL Parallel Filtering Method'. Together they form a unique fingerprint.

Cite this