TY - JOUR
T1 - Massive Fishing Website URL Parallel Filtering Method
AU - Xu, Dongliang
AU - Pan, Jingchang
AU - Du, Xiaojiang
AU - Wang, Bailing
AU - Liu, Meng
AU - Kang, Qinma
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2017/12/12
Y1 - 2017/12/12
N2 - A randomized fingerprint model is proposed, which can effectively reduce the false positive rate by generating a unique fingerprint for each URL. The model is also used to improve the Wu and Manber (WM) algorithm, which is a multi-string matching algorithm; as a result, a randomized fingerprint WM (RFP-WM) algorithm is proposed. Furthermore, a Graphics Processing Unit (GPU)-based parallel randomized fingerprint algorithm (GRFP-WM) is implemented. Experimental results indicate that, for a massive pattern set containing more than a million URLs, the efficiency of the RFP-WM algorithm is 20% higher than that of the WM algorithm. The WM algorithm's efficiency is approximately 7% higher than that of the Aho and Corasick (AC) algorithm, which is also a multi-string matching algorithm. The efficiency and speedup of the GRFP-WM algorithm are higher than those of the GPU-based WM and the GPU-based AC algorithms. These results indicate that the randomized fingerprint model can effectively reduce the collision rate and improve the efficiency of the algorithm.
AB - A randomized fingerprint model is proposed, which can effectively reduce the false positive rate by generating a unique fingerprint for each URL. The model is also used to improve the Wu and Manber (WM) algorithm, which is a multi-string matching algorithm; as a result, a randomized fingerprint WM (RFP-WM) algorithm is proposed. Furthermore, a Graphics Processing Unit (GPU)-based parallel randomized fingerprint algorithm (GRFP-WM) is implemented. Experimental results indicate that, for a massive pattern set containing more than a million URLs, the efficiency of the RFP-WM algorithm is 20% higher than that of the WM algorithm. The WM algorithm's efficiency is approximately 7% higher than that of the Aho and Corasick (AC) algorithm, which is also a multi-string matching algorithm. The efficiency and speedup of the GRFP-WM algorithm are higher than those of the GPU-based WM and the GPU-based AC algorithms. These results indicate that the randomized fingerprint model can effectively reduce the collision rate and improve the efficiency of the algorithm.
KW - GRFP-WM
KW - URL filtering
KW - randomized fingerprint model
UR - http://www.scopus.com/inward/record.url?scp=85038853426&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85038853426&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2017.2782847
DO - 10.1109/ACCESS.2017.2782847
M3 - Article
AN - SCOPUS:85038853426
VL - 6
SP - 2378
EP - 2388
JO - IEEE Access
JF - IEEE Access
ER -