[1]林子皓. 主题爬虫的设计与实现[J].计算机技术与发展,2014,24(08):99-102.
 LIN Zi-hao. Design and Implementation of Topic-focused Crawler[J].,2014,24(08):99-102.
点击复制

 主题爬虫的设计与实现()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
24
期数:
2014年08期
页码:
99-102
栏目:
智能、算法、系统工程
出版日期:
2014-08-10

文章信息/Info

Title:
 Design and Implementation of Topic-focused Crawler
文章编号:
1673-629X(2014)08-0099-04
作者:
 林子皓
 南京邮电大学 计算机学院
Author(s):
 LIN Zi-hao
关键词:
 主题爬虫HITS算法主题相关度
Keywords:
 topic crawlerHITS algorithmtopic similarity
分类号:
TP31
文献标志码:
A
摘要:
 在信息化爆炸的时代,一般搜索引擎的搜索结果已经满足不了人们的需要,能获得更准确全面信息的垂直搜索引擎越来越受到关注。其中,主题爬虫作为垂直搜索引擎的核心部分一直是搜索方向的研究热点。文中在分析主题爬虫的结构及特征的基础上,通过引入自己的主题相关度评价方法以及HITS网页排序算法,构建了一个主题爬虫。文中给出了爬虫实现的具体步骤,以云计算为主题,进行了实验。实验结果较好地反映了主题爬虫的实用性。
Abstract:
 In the era of information explosion,the general crawler cannot meet the requirements of personalized search in specific areas, but the topic crawler which can obtain more accurate and comprehensive information gets more attention. Among them,the topic crawler as the core part of the vertical search engine has been the research focus in the search direction. On the basis of analyzing the structure and characteristics of the topic crawler,design a topic crawler by introducing its own measurement of topic similarity and page ranking algo-rithm of HITS. Offer specific steps of implementing the crawler. An experiment with the theme of cloud computing has been carried out, which proves the practical applicability of topic crawler.

相似文献/References:

[1]袁浩 黄烟波.网页标题分析对主题爬虫的改进[J].计算机技术与发展,2009,(06):22.
 YUAN Hao,HUANG Yan-bo.Analysis of Title Page to Improve Focus Crawler[J].,2009,(08):22.
[2]罗林波 陈绮 吴清秀.基于Shark-Search和Hits算法的主题爬虫研究[J].计算机技术与发展,2010,(11):76.
 LUO Lin-bo,CHEN Qi,WU Qing-xiu.Research on Topical Crawler of Shark-Search Algorithm and Hits Algorithm[J].,2010,(08):76.
[3]赵思佳 尹婷.基于规则引擎的个性化主题网页爬虫的研究[J].计算机技术与发展,2011,(03):56.
 ZHAO Si-jia,YIN Ting.Research of Personalization Theme Crawler Based on Rule Engine[J].,2011,(08):56.
[4]张海亮 袁道华.基于遗传算法的主题爬虫[J].计算机技术与发展,2012,(08):48.
 ZHANG Hai-liang,YUAN Dao-hua.Focused Crawling Based on Genetic Algorithms[J].,2012,(08):48.
[5]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(08):1.
[6]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(08):5.
[7]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(08):13.
[8]侯善江[],张代远[][][]. 基于样条权函数神经网络P2P流量识别方法[J].计算机技术与发展,2014,24(07):21.
 HOU Shan-jiang[],ZHANG Dai-yuan[][][]. P2P Traffic Identification Based on Spline Weight Function Neural Network[J].,2014,24(08):21.
[9]李璨,耿国华,李康,等. 一种基于三维模型的文物碎片线图生成方法[J].计算机技术与发展,2014,24(07):25.
 LI Can,GENG Guo-hua,LI Kang,et al. A Method of Obtaining Cultural Debris’ s Line Chart Based on Three-dimensional Model[J].,2014,24(08):25.
[10]翁鹤,皮德常. 混沌RBF神经网络异常检测算法[J].计算机技术与发展,2014,24(07):29.
 WENG He,PI De-chang. Chaotic RBF Neural Network Anomaly Detection Algorithm[J].,2014,24(08):29.
[11]吴家皋[][],余浩[] [],张雪英[]. 基于链接回溯的地理信息更新主题爬虫研究[J].计算机技术与发展,2014,24(07):52.
 WU Jia-gao[] [],YU Hao[] [],ZHANG Xue-ying[]. Study of Topic-driven Web Crawler for Geographic Information Updating Based on Link Backtracking[J].,2014,24(08):52.
[12]张金,倪晓军. 基于语义树与VSM的主题爬取策略研究[J].计算机技术与发展,2017,27(11):66.
 ZHANG Jin,NI Xiao-jun. Research on Topic Crawling Strategy Based on Semantic Tree and VSM[J].,2017,27(08):66.

更新日期/Last Update: 2015-03-26