[1]弓丽栋,尹建华.基于 Octoparse 的 IPE 环境数据采集[J].计算机技术与发展,2022,32(04):200-204.[doi:10. 3969 / j. issn. 1673-629X. 2022. 04. 034]
 GONG Li-dong,YIN Jian-hua.Research on Octoparse-based IPE Environmental Data Scraping[J].,2022,32(04):200-204.[doi:10. 3969 / j. issn. 1673-629X. 2022. 04. 034]
点击复制

基于 Octoparse 的 IPE 环境数据采集()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
32
期数:
2022年04期
页码:
200-204
栏目:
应用前沿与综合
出版日期:
2022-04-10

文章信息/Info

Title:
Research on Octoparse-based IPE Environmental Data Scraping
文章编号:
1673-629X(2022)04-0200-05
作者:
弓丽栋12 尹建华2
1. 中国能源建设集团,北京 100022;
2. 对外经济贸易大学,北京 100029
Author(s):
GONG Li-dong12 YIN Jian-hua2
1. China Energy Engineering Group Co. ,Ltd. ,Beijing 100022,China;
2. University of International Business and Economics,Beijing 100029,China
关键词:
数据采集OctoparsePython公众环境研究中心政策评估
Keywords:
data scrapingOctoparsePythonIPEpolicy evaluation
分类号:
TP274+. 2;TP391
DOI:
10. 3969 / j. issn. 1673-629X. 2022. 04. 034
摘要:
随着对传统数据库数据资料使用的日益饱和, 为探索特定情境下的研究问题,科研人员开始迫切寻找更加灵活且多样的数据来源, 丰富的 Web 资源为这一需求提供了快捷通道。 为此,针对传统数据采集技术如 Python 的高编程门槛和高内存消耗等缺点,引入基于 C#的 Octoparse 数据采集技术, 分析了该技术在数据采集应用中的原理、优势和不足,并以IPE 公众环境研究中心为实验平台,设计了一套具备高复制性、高拓展性的数据采集规则,对 2004 ~ 2017 年间京津冀、长三角、珠三角地区总计 758 家废水国控重点监控企业的环保处罚记录进行了定向采集。 实验表明,相比 Python, Octoparse数据采集技术的规则设定更加便捷,批量采集更加稳定,数据导出更加多样,不仅可以有效降低编程门槛,而且能够规避由盲视操作导致的数据丢失,实现所见即所得。 该套规则可以为地方环境政策的评估以及区域环境经济的预测提供优质的数据支撑。
Abstract:
With the excessive usage of traditional and outdated databases,in order to investigate certain questions under particular circumstances,scientific researchers have hungered for more flexible and diverse data sources, while abundant Web resources provide a convenient access to achieve such target. Therefore,in order to improve the shortcomings of traditional data scraping technology,like high programming threshold and high memory consumption in Python, we introduce a new C # - based data scraping technology Octoparse,as well as its principal, advantages and limitations. By taking Institute of Public & Environmental Affairs as experimental platform,we design a set of replicable and extendible program to scrape the punishment records of 758 wastewater national specially monitored firms from Jing-Jin-Ji region,Yangtze River Delta and Pearl River Delta during the period of 2004 to 2017. It is shown that compared with Python, Octoparse has more convenient programing rules, more stable batch collection process and more diverse data export formats,which can not only significantly reduce the programming threshold,but also avoid the data loss caused by blind operation,for achieving WYSIWYG. This program can provide high - quality data support for local environmental policy evaluation and the prediction of regional environmental economy.

相似文献/References:

[1]韩伟杰 阎慧 董正宏.面向火炮火控系统的嵌入式软件测试平台[J].计算机技术与发展,2010,(03):180.
 HAN Wei-jie,YAN Hui,DONG Zheng-hong.Embedded Software Testing Platform for Gun Fire Control System[J].,2010,(04):180.
[2]邵兰洁 李光忠.Web使用挖掘的数据采集技术探究[J].计算机技术与发展,2010,(03):225.
 SHAO Lan-jie,LI Guang-zhong.Research on Techniques of Data Collecting for Web Usage Mining[J].,2010,(04):225.
[3]丁晓贵 刘桂江.基于SOPC的远程数据采集系统设计[J].计算机技术与发展,2010,(01):225.
 DING Xiao-gui,LIU Gui-jiang.Design of Remote Data Acquisition System Based on SOPC[J].,2010,(04):225.
[4]杨宇晓 宋茂忠 苗雄峰.基于DM642 DSP的网络化数据采集与编程实现[J].计算机技术与发展,2008,(06):188.
 YANG Yu-xiao,SONG Mao-zhong,MIAO Xiong-feng.Networked Data Collection System Based on DM642 and Programming Approach Accomplishment[J].,2008,(04):188.
[5]胡彧 李卿飞.煤矿安全仪表自动收发管理系统的研究与应用[J].计算机技术与发展,2008,(10):200.
 HU Yu,LI Qing-fei.Study and Application of Safety Instrument Self-Dispatch Management System in Colliery[J].,2008,(04):200.
[6]许延伟 刘希玉.无线数据采集的研究与实现[J].计算机技术与发展,2007,(07):190.
 XU Yan-wei,LIU Xi-yu.Research and Realization of Wireless Data Collection[J].,2007,(04):190.
[7]凌有铸 徐晓光 潘伟.基于WinCE的嵌入式远程实时监控系统[J].计算机技术与发展,2007,(07):204.
 LING You-zhu,XU Xiao-guang,PAN Wei.Remote Testing and Control System Based on Embedded WinCE[J].,2007,(04):204.
[8]邓海生 李军怀 刘红英.基于RFID的数据采集中间件[J].计算机技术与发展,2007,(09):188.
 DENG Hai-sheng,LI Jun-huai,LIU Hong-ying.Data- Collection Middleware According to RFID[J].,2007,(04):188.
[9]朱军 刘文清 刘建国 徐亮.基于峰值拟合算法的光谱分析[J].计算机技术与发展,2006,(04):125.
 ZHU Jun,LIU Wen-qing,LIU Jian-guo,et al.Spectra Analysis Based on Peak Fitting Algorithm[J].,2006,(04):125.
[10]刘瑞婷 张南平 陈勇.S7-200系列PLC自由口模式下实时通信技术研究[J].计算机技术与发展,2006,(12):156.
 LIU Rui-ting,ZHANG Nan-ping,CHEN Yong.ResearCh of Real- Time Communication Between PC and Series S7- 200 PLC in Free - Mode[J].,2006,(04):156.

更新日期/Last Update: 2022-04-10