[1]杨洋[][],李晓风[][],赵赫[][],等. 基于网络爬虫的文献检索系统的研究和实现[J].计算机技术与发展,2014,24(11):35-38.
 YANG Yang[][],LI Xiao-feng[][],ZHAO He[][],et al. Research and Realization of Academic Search System Based on Network Crawler[J].,2014,24(11):35-38.
点击复制

 基于网络爬虫的文献检索系统的研究和实现()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
24
期数:
2014年11期
页码:
35-38
栏目:
智能、算法、系统工程
出版日期:
2014-11-10

文章信息/Info

Title:
 Research and Realization of Academic Search System Based on Network Crawler
文章编号:
1673-629X(2014)11-0035-04
作者:
 杨洋[1][2]李晓风[1][2]赵赫[1][3]刘冰[1][2]
1. 中国科学院 合肥物质科学研究院;2.中国科学院大学;3.中国科学技术大学
Author(s):
 YANG Yang[1][2]LI Xiao-feng[1][2] ZHAO He[1][3]LIU Bing[1][2]
关键词:
 网络爬虫本体论论文检索Web MVC负载均衡
Keywords:
 network crawlerontologythesis retrievalWebMVCload balancing
分类号:
TP393.4
文献标志码:
A
摘要:
 文中系统基于网络爬虫技术实现了文献资源的智能搜索和关键信息的抓取功能,把采集到的信息采用本体论的方法进行分类识别,并自动存储文献资源到本地服务器。下载子系统采用负载均衡的方法把下载任务分配到多个服务器。系统采用高效的Protobuf socket通信手段,提供高效准确的内部下载服务。通过对内提供统一门户入口的方式对检索和下载行为进行记录,有效避免了同一资源的重复下载,也使得文献检索和下载行为变得可追溯,为图书文献情报管理和研究工作提供了数据支撑。该系统可有效减少科研机构获取学术资源所需的资金投入并减少网络带宽占用。
Abstract:
 This system has realized intelligent search and external academic resources capture based on network crawler technique. It uses ontology technology to identify each article and automatically store the resources into local repository. Downloading subsystem in this sys-tem applies load balance method to distribute downloading tasks equally to each download server. Protobuf,a high-efficiency communica-tion mechanism,provides downloading service with high availability and accuracy in this system. At the same time,this system has solved the problem of repeated downloading and access recording by offering a unique entrance to the whole institute. Access control is also de-signed to eliminate malicious and excessive downloading. System automatically saves user searching data,which makes information re-trieval becomes traceable,providing data support for library information management and research. This system can effectively reduce ex-pense on digital academic resources for institute and network bandwidth.

相似文献/References:

[1]张林才 张燕 王红霞.节点对等WebSpider设计与实现[J].计算机技术与发展,2010,(03):195.
 ZHANG Lin-cai,ZHANG Yan,WANG Hong-xia.Design and Realization of Peer - to - Peer Web Spider[J].,2010,(11):195.
[2]张春元 康耀红 伍小芹.Web新闻自动采集发布系统的设计与实现[J].计算机技术与发展,2009,(09):250.
 ZHANG Chun-yuan,KANG Yao-hong,WU Xiao-qin.Design and Implementation of Web News Automatically Gathering and Publishing System[J].,2009,(11):250.
[3]周凤丽 林晓丽.基于Lucene的Web搜索引擎的研究和实现[J].计算机技术与发展,2012,(01):140.
 ZHOU Feng-li,LIN Xiao-li.Research and Implementation of Web Search Engine Based on Lucene[J].,2012,(11):140.
[4]黄宇达 魏霞 王迤冉[].一种轻量级中文搜索引擎模型的设计与实现[J].计算机技术与发展,2012,(09):201.
 HUANG Yu-da,WEI Xia,WANG Yi-ran.Design and Implementation of System Model of a Lightweight Chinese Search Engine[J].,2012,(11):201.
[5]张俊,李鲁群,周熔.基于Lucene的搜索引擎的研究与应用[J].计算机技术与发展,2013,(06):230.
 ZHANG Jun,LI Lu-qun,ZHOU Rong.Research and Application of Search Engine Based on Lucene[J].,2013,(11):230.
[6]孙青云,王俊峰,赵宗渠,等.一种基于模拟登录的微博数据采集方案[J].计算机技术与发展,2014,24(03):6.
 SUN Qing-yun[],WANG Jun-feng[],ZHAO Zong-qu[],et al.A Microblog Data Collection Method Based on Simulated Login Technology[J].,2014,24(11):6.
[7]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(11):1.
[8]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(11):5.
[9]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(11):13.
[10]侯善江[],张代远[][][]. 基于样条权函数神经网络P2P流量识别方法[J].计算机技术与发展,2014,24(07):21.
 HOU Shan-jiang[],ZHANG Dai-yuan[][][]. P2P Traffic Identification Based on Spline Weight Function Neural Network[J].,2014,24(11):21.
[11]付剑生[] .徐林龙[]。 林文斌[]. 分布式全网职位搜索引擎的研究与实现[J].计算机技术与发展,2015,25(05):6.
 FU Jian-sheng[],XU Lin-long[],LIN Wen-bin[]. Research and Implementation of Distributed Network-wide Job Search Engine[J].,2015,25(11):6.
[12]王荩梓,赖雯洁. 基于房产交易网站的数据获取与在线工具开发[J].计算机技术与发展,2017,27(05):154.
 WANG Jin-zi,LAI Wen-jie. Data Acquisition and Development of Online Analysis Tools Based on Real Estate Transaction Websites[J].,2017,27(11):154.
[13]陈春玲,张凡,余瀚.Web应用程序漏洞检测系统设计[J].计算机技术与发展,2017,27(09):101.
 CHEN Chun-ling,ZHANG Fan,YU Han. Design of Vulnerability Detection System for Web Application Program[J].,2017,27(11):101.
[14]陈珂,蓝鼎栋,柯文德,等. 基于Java的新浪微博爬虫研究与实现[J].计算机技术与发展,2017,27(09):191.
 CHEN Ke,LAN Ding-dong,KE Wen-de,et al. Research and Realization of Weibo Crawler with Java[J].,2017,27(11):191.

更新日期/Last Update: 2015-04-03