«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j. issn. 1673-629X. 2019. 06. 026]
点击复制

基于 WebMagic 爬取技术的电力事故信息获取()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 29
期数:: 2019年06期

页码:: 125-129

栏目:: 应用开发研究

出版日期:: 2019-06-10

文章信息/Info

Title:: Acquisition of Electric Power Accident Information Based on WebMagic Crawling Technology

文章编号:: 1673-629X(2019)06-0125-05

作者:: 党佩; 阎光伟; 华北电力大学控制与计算机工程学院,北京 102206

Author(s):: DANG Pei; YAN Guang-wei; School of Controland Computer Engineering,North China Electric Power University,Beijing 102206,China

关键词:: 电力事故; 网络爬虫; WebMagic; 数据抓取

Keywords:: electric power accident; web crawler; WebMagic; data crawling

分类号:: TP39

DOI:: 10. 3969 / j. issn. 1673-629X. 2019. 06. 026

摘要:: 当前国民经济正处于迅猛发展的大好时期,也是电力工业体制改革的关键时期,对电力的需求十分紧迫,所以,电力系统的安全稳定运行及人员的安全管理日益成为影响电力工业发展的关键要素。近年来,各类电力事故依旧时有发生,全面调查事故发生原因是非常必要的,因此,进行事故信息的收集、管理和分析成为关键的一步。采用传统的方式,人工使用搜索引擎搜索信息,费时费力,而随着互联网技术的不断发展,网络爬虫技术已日渐成熟,应用网络爬虫技术可以快速获取这类事故信息。文中主要应用 WebMagic 爬虫技术,利用 XPath 和正则表达式指定信息的抽取规则,从电力安全管理网上抓取有关于电力事故信息的新闻,匹配符合要求的事故描述信息,下载到本地并实现数据存储进数据库,为之后进行事故信息分析提供数据基础。实验结果显示,该技术能够准确、迅速地获取数据,且爬虫程序简单易维护。

Abstract:: At present,the national economy is in a great period of rapid development,which is also a crucial period for China’s electric power industry system reform,and the demand for electric power is quite urgent. Therefore,the safe and stable operation of electric power system and the safety management of personnel are increasingly becoming the key factors affecting the development of the electric power industry. In recent years,various types of electric power accidents have occurred from time to time. It is necessary to investigate the causes of these accidents. Therefore,the collection,management and analysis of accident information has become a crucial step. In the traditional way,manually using search engines to search for information is time - consuming and laborious. With the continuous development of Internet technology,web crawling technology has become more and more mature,by which such accident information can be obtained quickly. We mainly use WebMagic crawler technology,and apply XPath and regular expressions to specify the information extraction rules,grabbing news about electric power accident information from the Electric Power Security Management Network,matching the accident description information that meets the requirements and realizing data storage into the database,which provides a data foundation for subsequent analysis of accident information. The experiment shows that the proposed technology can acquire data accurately and quickly,and the crawler program is simple and easy to maintain

相似文献/References:

[1]张林才张燕王红霞.节点对等WebSpider设计与实现[J].计算机技术与发展,2010,(03):195.
　ZHANG Lin-cai,ZHANG Yan,WANG Hong-xia.Design and Realization of Peer - to - Peer Web Spider[J].,2010,(06):195.
[2]张春元康耀红伍小芹.Web新闻自动采集发布系统的设计与实现[J].计算机技术与发展,2009,(09):250.
　ZHANG Chun-yuan,KANG Yao-hong,WU Xiao-qin.Design and Implementation of Web News Automatically Gathering and Publishing System[J].,2009,(06):250.
[3]周凤丽林晓丽.基于Lucene的Web搜索引擎的研究和实现[J].计算机技术与发展,2012,(01):140.
　ZHOU Feng-li,LIN Xiao-li.Research and Implementation of Web Search Engine Based on Lucene[J].,2012,(06):140.
[4]黄宇达魏霞王迤冉[].一种轻量级中文搜索引擎模型的设计与实现[J].计算机技术与发展,2012,(09):201.
　HUANG Yu-da,WEI Xia,WANG Yi-ran.Design and Implementation of System Model of a Lightweight Chinese Search Engine[J].,2012,(06):201.
[5]张俊,李鲁群,周熔.基于Lucene的搜索引擎的研究与应用[J].计算机技术与发展,2013,(06):230.
　ZHANG Jun,LI Lu-qun,ZHOU Rong.Research and Application of Search Engine Based on Lucene[J].,2013,(06):230.
[6]孙青云,王俊峰,赵宗渠,等.一种基于模拟登录的微博数据采集方案[J].计算机技术与发展,2014,24(03):6.
　SUN Qing-yun[],WANG Jun-feng[],ZHAO Zong-qu[],et al.A Microblog Data Collection Method Based on Simulated Login Technology[J].,2014,24(06):6.
[7]杨洋[][],李晓风[][],赵赫[][],等. 基于网络爬虫的文献检索系统的研究和实现[J].计算机技术与发展,2014,24(11):35.
　YANG Yang[][],LI Xiao-feng[][],ZHAO He[][],et al. Research and Realization of Academic Search System Based on Network Crawler[J].,2014,24(06):35.
[8]付剑生[] .徐林龙[]。林文斌[]. 分布式全网职位搜索引擎的研究与实现[J].计算机技术与发展,2015,25(05):6.
　FU Jian-sheng[],XU Lin-long[],LIN Wen-bin[]. Research and Implementation of Distributed Network-wide Job Search Engine[J].,2015,25(06):6.
[9]韩贝,马明栋,王得玉.基于Scrapy框架的爬虫和反爬虫研究[J].计算机技术与发展,2019,29(02):139.[doi:10．3969/j．issn．1673－629X．2019．02．029]
　HAN Bei,MA Mingdong,WANG Deyu.Research on Crawler and Anti－reptile Based on Scrapy Framework[J].,2019,29(06):139.[doi:10．3969/j．issn．1673－629X．2019．02．029]
[10]王荩梓,赖雯洁. 基于房产交易网站的数据获取与在线工具开发[J].计算机技术与发展,2017,27(05):154.
　WANG Jin-zi,LAI Wen-jie. Data Acquisition and Development of Online Analysis Tools Based on Real Estate Transaction Websites[J].,2017,27(06):154.

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed845
全文下载/Downloads595
评论/Comments