[1]邹建鑫,李红灵.基于网站访问行为的匿名爬虫检测[J].计算机技术与发展,2017,27(12):103-107.[doi:10.3969/ j. issn.1673-629X.2017.12.023]
 ZOU Jian-xin,LI Hong-ling.Anonymous Crawler Detection Based on Web Access[J].Computer Technology and Development,2017,27(12):103-107.[doi:10.3969/ j. issn.1673-629X.2017.12.023]
点击复制

基于网站访问行为的匿名爬虫检测()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
27
期数:
2017年12期
页码:
103-107
栏目:
安全与防范
出版日期:
2017-12-10

文章信息/Info

Title:
Anonymous Crawler Detection Based on Web Access
文章编号:
1673-629X(2017)12-0103-05
作者:
邹建鑫李红灵
云南大学 信息学院 计算机科学与工程系,云南 昆明 650000
Author(s):
ZOU Jian-xinLI Hong-ling
Department of Computer Science and Engineering,School of Information Science and Engineering,
Yunnan University,Kunming 650000,China
关键词:
网络爬虫网络机器人排斥协议网站访问行为匿名爬虫检测
Keywords:
web crawlerrobot exclusion protocolwebsite accesscamouflage crawler detection
分类号:
TP393.08
DOI:
10.3969/ j. issn.1673-629X.2017.12.023
文献标志码:
A
摘要:
通过分析和研究网络爬虫访问网页内容的行为,针对恶意网络爬虫伪装成浏览器访问网站难以甄别、网站日志检测工具不支持匿名网络爬虫检测等问题,总结了一些基于机器人排斥协议和基于爬虫行为的恶意网络爬虫检测算法。 通过这些网络爬虫检测算法的启发,提出一种基于爬虫行为的检测匿名爬虫算法。 该算法主要根据人为访问网站与网络爬虫访问网站时间的长短、访问的周期等,对网络爬虫进行检测,同时对算法进行了实验验证。 实验数据来自一个服务器的
网络日志。应用 Python 对实验数据进行处理,从而对网络匿名爬虫进行检测,并与当前主流的匿名网络爬虫检测算法进行比较。结果表明,该算法能够检测出并发量小的匿名的网络爬虫。
Abstract:
By analysis and study of web crawler accessing web page,some detection algorithms of malicious web crawler are summarized based on robot exclusion protocol and crawling,aiming to the problem that it is difficult to identify website accessing from malicious web crawler disguised as a browser,and that web log detection tools don’t support anonymous web crawler detection. In consideration of above algorithms,a new one to identify the camouflage web crawler is proposed based on crawling. It detects the web crawler mainly according to the length of access time and access cycle of website accessing form both human and crawler,and is verified by an experiment,the data of which is from a server web log. The experimental data are processed by Python for anonymous crawler detection. Compared with mainstream detection algorithm of anonymous web crawler,the proposed algorithm can detect the small amount of concurrent anonymous web crawler.

相似文献/References:

[1]张林才 张燕 王红霞.节点对等WebSpider设计与实现[J].计算机技术与发展,2010,(03):195.
 ZHANG Lin-cai,ZHANG Yan,WANG Hong-xia.Design and Realization of Peer - to - Peer Web Spider[J].Computer Technology and Development,2010,(12):195.
[2]张春元 康耀红 伍小芹.Web新闻自动采集发布系统的设计与实现[J].计算机技术与发展,2009,(09):250.
 ZHANG Chun-yuan,KANG Yao-hong,WU Xiao-qin.Design and Implementation of Web News Automatically Gathering and Publishing System[J].Computer Technology and Development,2009,(12):250.
[3]周凤丽 林晓丽.基于Lucene的Web搜索引擎的研究和实现[J].计算机技术与发展,2012,(01):140.
 ZHOU Feng-li,LIN Xiao-li.Research and Implementation of Web Search Engine Based on Lucene[J].Computer Technology and Development,2012,(12):140.
[4]黄宇达 魏霞 王迤冉[].一种轻量级中文搜索引擎模型的设计与实现[J].计算机技术与发展,2012,(09):201.
 HUANG Yu-da,WEI Xia,WANG Yi-ran.Design and Implementation of System Model of a Lightweight Chinese Search Engine[J].Computer Technology and Development,2012,(12):201.
[5]张俊,李鲁群,周熔.基于Lucene的搜索引擎的研究与应用[J].计算机技术与发展,2013,(06):230.
 ZHANG Jun,LI Lu-qun,ZHOU Rong.Research and Application of Search Engine Based on Lucene[J].Computer Technology and Development,2013,(12):230.
[6]孙青云,王俊峰,赵宗渠,等.一种基于模拟登录的微博数据采集方案[J].计算机技术与发展,2014,24(03):6.
 SUN Qing-yun[],WANG Jun-feng[],ZHAO Zong-qu[],et al.A Microblog Data Collection Method Based on Simulated Login Technology[J].Computer Technology and Development,2014,24(12):6.
[7]杨洋[][],李晓风[][],赵赫[][],等. 基于网络爬虫的文献检索系统的研究和实现[J].计算机技术与发展,2014,24(11):35.
 YANG Yang[][],LI Xiao-feng[][],ZHAO He[][],et al. Research and Realization of Academic Search System Based on Network Crawler[J].Computer Technology and Development,2014,24(12):35.
[8]付剑生[] .徐林龙[]。 林文斌[]. 分布式全网职位搜索引擎的研究与实现[J].计算机技术与发展,2015,25(05):6.
 FU Jian-sheng[],XU Lin-long[],LIN Wen-bin[]. Research and Implementation of Distributed Network-wide Job Search Engine[J].Computer Technology and Development,2015,25(12):6.
[9]韩贝,马明栋,王得玉.基于Scrapy框架的爬虫和反爬虫研究[J].计算机技术与发展,2019,29(02):139.[doi:10.3969/j.issn.1673-629X.2019.02.029]
 HAN Bei,MA Mingdong,WANG Deyu.Research on Crawler and Anti-reptile Based on Scrapy Framework[J].Computer Technology and Development,2019,29(12):139.[doi:10.3969/j.issn.1673-629X.2019.02.029]
[10]王荩梓,赖雯洁. 基于房产交易网站的数据获取与在线工具开发[J].计算机技术与发展,2017,27(05):154.
 WANG Jin-zi,LAI Wen-jie. Data Acquisition and Development of Online Analysis Tools Based on Real Estate Transaction Websites[J].Computer Technology and Development,2017,27(12):154.

更新日期/Last Update: 2018-03-06