[1]张海亮 袁道华.基于遗传算法的主题爬虫[J].计算机技术与发展,2012,(08):48-52.
 ZHANG Hai-liang,YUAN Dao-hua.Focused Crawling Based on Genetic Algorithms[J].,2012,(08):48-52.
点击复制

基于遗传算法的主题爬虫()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2012年08期
页码:
48-52
栏目:
智能、算法、系统工程
出版日期:
1900-01-01

文章信息/Info

Title:
Focused Crawling Based on Genetic Algorithms
文章编号:
1673-629X(2012)08-0048-05
作者:
张海亮 袁道华
四川大学计算机学院
Author(s):
ZHANG Hai-liang YUAN Dao-hua
College of Computer Science, Sichuan University
关键词:
遗传算法爬虫主题爬虫主题相关度网页重要性
Keywords:
genetic algorithm crawler focused crawler topic similarity web importance
分类号:
TP301.6
文献标志码:
A
摘要:
针对目前主题网络爬虫搜索策略难以在全局范围内找到最优解,通过对遗传算法的分析与研究,文中设计了一个基于遗传算法的主题爬虫方案。引入了结合文本内容的PageRank算法;采用向量空间模型算法计算网页主题相关度;采取网页链接结构与主题相关度来评判网页的重要性;依据网页重要性选择爬行中的遗传因子;设置适应度函数筛选与主题相关的网页。与普通的主题爬虫比较,该策略能够获取大量主题相关度高的网页信息,能够提高获取的网页的重要性,能够满足用户对所需主题网页的检索需求,并在一定程度上解决了上述问题
Abstract:
Optimized solution can't be found in the global scope based on the present searching strategy of focused crawler. A focused crawler method based on genetic algorithm is proposed through the analysis and study of genetic algorithm. This method introduces the PageRank algorithm combined with text contents, computes the page topic similarity with vector space model algorithm, and judges the importance of web page according to web link structure and topic similarity. At the same time, the genetic factors are selected on basis of the importance of web page. The system sets fitness function to select pages relevant with topic. Compared to focused crawler, the topic crawler based on genetic algorithms could obtain the web pages which have strong correlation with subjects, and improve the importance of access web pages, and satisfy user' s demand for searching topic webs they,re interested in. So in a certain extent, the above problems are solved

相似文献/References:

[1]冯智明,苏一丹,覃华,等.基于遗传算法的聚类与协同过滤组合推荐算法[J].计算机技术与发展,2014,24(01):35.
 FENG Zhi-ming,SU Yi-dan,QIN Hua,et al.Recommendation Algorithm of Combining Clustering with Collaborative Filtering Based on Genetic Algorithm[J].,2014,24(08):35.
[2]余晓光 严洪森 殷乾坤.基于Flexsim的车间调度优化[J].计算机技术与发展,2010,(03):44.
 YU Xiao-guang,YAN Hong-sen,YIN Qian-kun.Workshops Scheduling Optimization Based on Flexsim Simulation[J].,2010,(08):44.
[3]贺计文 宋承祥 刘弘.基于遗传算法的八数码问题的设计及实现[J].计算机技术与发展,2010,(03):105.
 HE Ji-wen,SONG Cheng-xiang,LIU Hong.Design and Implementation of Eight Puzzle Problem Based on Genetic Algorithms[J].,2010,(08):105.
[4]沈珏萍 庄亚明.基于Agent的二级供应链企业自动谈判研究[J].计算机技术与发展,2010,(03):121.
 SHEN Jue-ping,ZHUANG Ya-ming.A Research for Company Automatic Negotiation in Secondary Supply Chain Based on Agent[J].,2010,(08):121.
[5]张磊 王晓军.基于遗传算法的业务流程测试[J].计算机技术与发展,2010,(03):155.
 ZHANG Lei,WANG Xiao-jun.Test of Business Process Based on Genetic Algorithm[J].,2010,(08):155.
[6]曹道友 程家兴.基于改进的选择算子和交叉算子的遗传算法[J].计算机技术与发展,2010,(02):44.
 CAO Dao-you,CHENG Jia-xing.A Genetic Algorithm Based on Modified Selection Operator and Crossover Operator[J].,2010,(08):44.
[7]范维博 周俊 许正良.应用遗传算法求解第一类装配线平衡问题[J].计算机技术与发展,2010,(02):194.
 FAN Wei-bo,ZHOU Jun,XU Zheng-liang.Appication of Genetic Algorithm to Assembly Line Balancing[J].,2010,(08):194.
[8]熊伟平 曾碧卿.几种仿生优化算法的比较研究[J].计算机技术与发展,2010,(03):9.
 XIONG Wei-ping,ZENG Bi-qing.Studies on Some Bionic Optimization Algorithms[J].,2010,(08):9.
[9]余晓光 严洪森.基于禁忌搜索遗传混合算法的装配线平衡[J].计算机技术与发展,2010,(05):5.
 YU Xiao-guang,YAN Hong-sen.Assembly Line Balancing Based on Tabu Search and Genetic Hybrid Algorithm[J].,2010,(08):5.
[10]黄永聪 张旭[] 吴义纯 吴琦 程家兴.改进的径向基函数网络的研究及应用[J].计算机技术与发展,2010,(05):158.
 HUANG Yong-cong,ZHANG Xu,WU Yi-chun,et al.Research and Application of Improved Genetic Algorithm-Based RBFANN[J].,2010,(08):158.

备注/Memo

备注/Memo:
张海亮(1987-),男,四川成都人,硕士,研究方向为分布式并行处理与网络计算;袁道华,教授,硕士生导师,研究方向为分布式并行处理与网络计算
更新日期/Last Update: 1900-01-01