[1]赵思佳 尹婷.基于规则引擎的个性化主题网页爬虫的研究[J].计算机技术与发展,2011,(03):56-59.
 ZHAO Si-jia,YIN Ting.Research of Personalization Theme Crawler Based on Rule Engine[J].,2011,(03):56-59.
点击复制

基于规则引擎的个性化主题网页爬虫的研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2011年03期
页码:
56-59
栏目:
智能、算法、系统工程
出版日期:
1900-01-01

文章信息/Info

Title:
Research of Personalization Theme Crawler Based on Rule Engine
文章编号:
1673-629X(2011)03-0056-04
作者:
赵思佳 尹婷
中南大学信息科学与工程学院
Author(s):
ZHAO Si-jia YIN Ting
Institute of Information Science and Engineering, Central South University
关键词:
规则引擎主题爬虫搜索引擎
Keywords:
rules engine subject crawler search engine
分类号:
TP31
文献标志码:
A
摘要:
目前互联网上的信息正在飞速的增长,人们主要依靠搜索引擎查找信息,随着专业化不断加强,垂直搜索引擎成为人们新的工具,但构建专业化搜索引擎过程比较复杂。为了解决垂直搜索引擎中主题爬虫配置不灵活的问题,采用在爬虫上集成规则引擎的方法,通过规则库来控制爬虫运行,并且使用可扩展度较高的开源爬虫项目Heritrix和开源规则引擎项目Drools,构建配置方便、灵活度高的个性化爬虫,从而将原先主题爬虫的设置从紧耦合转变成了松耦合,降低了用户配置难度
Abstract:
Information on the interact is now rapid growth, people mainly rely on search engines to find information, continue to strengthen as the specialized, vertical search engines become the new tool, but the process of building specialized search engines is more complex. In order to solve focused crawler is not configured flexible on vertical search engines, adopt an integrated rules engine in the reptile on the way to control the reptiles through the rule base running, and use a higher degree of open source scalable Heritrix crawler project and open source rules engine project Drools, easy to build configuration , and high flexibility of individual reptiles, which will set the original theme from the tight coupling reptiles turned into loosely coupled, reducing the user configuration difficult

相似文献/References:

[1]袁浩 黄烟波.网页标题分析对主题爬虫的改进[J].计算机技术与发展,2009,(06):22.
 YUAN Hao,HUANG Yan-bo.Analysis of Title Page to Improve Focus Crawler[J].,2009,(03):22.
[2]陶晓俊 朱敏.基于规则引擎的企业服务开发模式[J].计算机技术与发展,2008,(02):115.
 TAO Xiao-jun,ZHU Min.Pattern of Building Enterprise Services Based on Rule Engine[J].,2008,(03):115.
[3]罗林波 陈绮 吴清秀.基于Shark-Search和Hits算法的主题爬虫研究[J].计算机技术与发展,2010,(11):76.
 LUO Lin-bo,CHEN Qi,WU Qing-xiu.Research on Topical Crawler of Shark-Search Algorithm and Hits Algorithm[J].,2010,(03):76.
[4]张海亮 袁道华.基于遗传算法的主题爬虫[J].计算机技术与发展,2012,(08):48.
 ZHANG Hai-liang,YUAN Dao-hua.Focused Crawling Based on Genetic Algorithms[J].,2012,(03):48.
[5]娄云峰,周兴社,杨刚,等.基于规则引擎的实时关键绩效指标生成技术[J].计算机技术与发展,2013,(09):59.
 LOU Yun-feng,ZHOU Xing-she,YANG Gang,et al.Real-time Key Performance Indicators Formation Technology Based on Rule-engine[J].,2013,(03):59.
[6]黄国平,武旭红,张宏.面向灾害大数据预警信息高速处理的规则引擎[J].计算机技术与发展,2014,24(04):239.
 HUANG Guo-ping[],WU Xu-hong[],ZHANG Hong[].Rule Engine of High-speed Processing Oriented to Disaster Warning Information of Big Data[J].,2014,24(03):239.
[7]吴家皋[][],余浩[] [],张雪英[]. 基于链接回溯的地理信息更新主题爬虫研究[J].计算机技术与发展,2014,24(07):52.
 WU Jia-gao[] [],YU Hao[] [],ZHANG Xue-ying[]. Study of Topic-driven Web Crawler for Geographic Information Updating Based on Link Backtracking[J].,2014,24(03):52.
[8]林子皓. 主题爬虫的设计与实现[J].计算机技术与发展,2014,24(08):99.
 LIN Zi-hao. Design and Implementation of Topic-focused Crawler[J].,2014,24(03):99.
[9]王旸. 基于规则引擎的医院处方审核系统的设计实现[J].计算机技术与发展,2015,25(11):186.
 WANG Yang. Design and Implementation of Prescriptions Auditing System Based on Rule Engine[J].,2015,25(03):186.
[10]张金,倪晓军. 基于语义树与VSM的主题爬取策略研究[J].计算机技术与发展,2017,27(11):66.
 ZHANG Jin,NI Xiao-jun. Research on Topic Crawling Strategy Based on Semantic Tree and VSM[J].,2017,27(03):66.

备注/Memo

备注/Memo:
信息产业部电子发展基金项目(信部运[2006]634号)赵思佳(1983-),男,湖南衡阳人,湖南环境生物职业技术学院讲师,研究方向为计算机网络、信息系统
更新日期/Last Update: 1900-01-01