[1]罗林波 陈绮 吴清秀.基于Shark-Search和Hits算法的主题爬虫研究[J].计算机技术与发展,2010,(11):76-79.
 LUO Lin-bo,CHEN Qi,WU Qing-xiu.Research on Topical Crawler of Shark-Search Algorithm and Hits Algorithm[J].,2010,(11):76-79.
点击复制

基于Shark-Search和Hits算法的主题爬虫研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2010年11期
页码:
76-79
栏目:
智能、算法、系统工程
出版日期:
1900-01-01

文章信息/Info

Title:
Research on Topical Crawler of Shark-Search Algorithm and Hits Algorithm
文章编号:
1673-629X(2010)10-0076-04
作者:
罗林波1 陈绮1 吴清秀2
[1]海南大学信息科学技术学院[2]海南软件职业技术学院
Author(s):
LUO Lin-boCHEN QiWU Qing-xiu
[1]College of Information Science and Technology,Hainan University[2]Hainan Software Profession Institute
关键词:
主题爬虫爬行策略垂直搜索引擎
Keywords:
topical crawler crawling strategy vertical search engine
分类号:
TP393
文献标志码:
A
摘要:
主题爬虫是实现垂直搜索引擎的核心技术。介绍主题爬虫的两个重要爬行算法:基于网页内容评价的Shark-Search算法和基于网页链接关系的Hits算法,并分析了各自的优缺点,提出了一种新的主题爬行策略:将上述两种算法的优点结合起来即将基于网页内容评价和基于网页链接关系算法结合起来判断待下载url的优劣,并实现了一个主题爬虫。这种新策略正好弥补了两个算法各自的不足。通过与Shark-Search算法和Hits算法实现的主题爬虫对比,发现用新算法实现的主题爬虫查准率比这两种算法高
Abstract:
Topical crawler is the core technology to achieve vertical search engine.There are two important crawling algorithms to be introduced:content-based evaluation of Shark-Search algorithm and link-based relationships Hits algorithms.It analyzed their respective advantages and disadvantages and proposed a new topical crawling strategy that is to combine the two algorithms which include content-based evaluation and link-based relationships,to judge whether url to be downloaded is good or bad,and implements a topical crawler.This new crawling strategy can make up for the deficiencies of the two algorithms.With the Shark-Search algorithm and the algorithm of the Hits contrast,it is inferred that the effect of using the new topical crawling algorithm which reaches the degree of accuracy is better than those two algorithms

相似文献/References:

[1]袁浩 黄烟波.网页标题分析对主题爬虫的改进[J].计算机技术与发展,2009,(06):22.
 YUAN Hao,HUANG Yan-bo.Analysis of Title Page to Improve Focus Crawler[J].,2009,(11):22.
[2]赵思佳 尹婷.基于规则引擎的个性化主题网页爬虫的研究[J].计算机技术与发展,2011,(03):56.
 ZHAO Si-jia,YIN Ting.Research of Personalization Theme Crawler Based on Rule Engine[J].,2011,(11):56.
[3]张海亮 袁道华.基于遗传算法的主题爬虫[J].计算机技术与发展,2012,(08):48.
 ZHANG Hai-liang,YUAN Dao-hua.Focused Crawling Based on Genetic Algorithms[J].,2012,(11):48.
[4]吴家皋[][],余浩[] [],张雪英[]. 基于链接回溯的地理信息更新主题爬虫研究[J].计算机技术与发展,2014,24(07):52.
 WU Jia-gao[] [],YU Hao[] [],ZHANG Xue-ying[]. Study of Topic-driven Web Crawler for Geographic Information Updating Based on Link Backtracking[J].,2014,24(11):52.
[5]林子皓. 主题爬虫的设计与实现[J].计算机技术与发展,2014,24(08):99.
 LIN Zi-hao. Design and Implementation of Topic-focused Crawler[J].,2014,24(11):99.
[6]张金,倪晓军. 基于语义树与VSM的主题爬取策略研究[J].计算机技术与发展,2017,27(11):66.
 ZHANG Jin,NI Xiao-jun. Research on Topic Crawling Strategy Based on Semantic Tree and VSM[J].,2017,27(11):66.
[7]王宁邦,徐 博.“互联网+”环境下移动校园搜索引擎设计与实现[J].计算机技术与发展,2020,30(08):157.[doi:10. 3969 / j. issn. 1673-629X. 2020. 08. 027]
 WANG Ning-bang,XU Bo.Design and Implementation of Mobile Campus Search Engines in “Internet+” Environment[J].,2020,30(11):157.[doi:10. 3969 / j. issn. 1673-629X. 2020. 08. 027]

备注/Memo

备注/Memo:
海南省自然科学基金资助项目(609003); 海南大学科研项目(hd09xm84)罗林波(1982-),男,湖北黄冈人,硕士研究生,研究方向为数据挖掘;陈绮,副教授,博士,硕士生导师,研究方向为数据挖掘
更新日期/Last Update: 1900-01-01