«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j. issn.1673-629X.2018.06.033]
点击复制

基于正则抽取的竹种数据结构化方法研究()

分享到：

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 28
期数:: 2018年06期

页码:: 147-150

栏目:: 应用开发研究

出版日期:: 2018-06-10

文章信息/Info

Title:: Research on a Data Structuralization Method of Bamboo Species Based on Regular Extraction Model

文章编号:: 1673-629X(2018)06-0147-04

作者:: 李欣; 李绍稳; 许高建; 林建彬; 安徽农业大学信息与计算机学院,安徽合肥 230036

Author(s):: LI Xin; LI Shao-wen; XU Gao-jian; LIN Jian-bin; School of Information and Computer Science,Anhui Agricultural University,Hefei 230036,China

关键词:: 信息抽取; 正则表达式; 竹种数据; 数据结构化

Keywords:: information extraction; regular expression; bamboo species data; data structuring

分类号:: TP391

DOI:: 10.3969/ j. issn.1673-629X.2018.06.033

文献标志码:: A

摘要:: 研究旨在通过基于规则的信息抽取技术解决竹类种质资源(简称竹种)数据的自动提取和结构化存储问题,为快速构建竹种数据库提出一种基于正则抽取模型的竹种数据结构化方法。该方法以竹种数据库表结构为抽取模板,以数据表属性名称为规则触发词,利用正则表达式构建抽取规则,构建正则抽取模型。以中国植物志在线版为实验对象,通过网页解析和字段抽取两步实现了竹种数据的自动抽取与结构化,实验抽取竹种信息五百多条,取数据表前八个字段进行抽样统计分析,抽取竹种有效字段信息准确率高达 89%以上。实验结果表明,基于正则抽取的竹种数据结构化方法是可行有效的,并采用 Java 语言开发了竹种信息抽取系统,实现了该方法。

Abstract:: This study aims to provide a effective and feasible method for efficiently constructing the Bamboo species database by automatically extracting and structurally storing the morphological data of Bamboo germplasm resources (Bamboo species) through the information extraction technology. To develop the Bamboo regular extraction model,the Bamboo species structure is used as extraction template,database properties as regulation triggers and then the extraction regulation is constructed by regular expression. The experimental objective is set as the flora of Chinese online edition,and then the Bamboo species data is structurally extracted by two steps including web crawler and text extraction. Over five hundred of Bamboo species information is extracted. The accuracy rate of effective field information for extracted Bamboo species is more than 89%. The suggested method is achieved by developing the Bamboo species data extraction system using Java language. On the basis of regular expression,it is a feasible and effective data structuring method.

相似文献/References:

[1]宋鑫坤陈万米朱明桂春胜程硕远陈海波.基于正则表达式的语音识别控制策略研究[J].计算机技术与发展,2010,(02):106.
　SONG Xin-kun,CHEN Wan-mi,ZHU Ming,et al.Study on Speech Recognition Control Strategy Based on Regular Expression[J].,2010,(06):106.
[2]秦振海谭守标徐超.基于Web的表格信息抽取研究[J].计算机技术与发展,2010,(02):217.
　QIN Zhen-hai,TAN Shou-biao,XU Chao.Study on ,Tables Information Extraction Based on Web[J].,2010,(06):217.
[3]韩普姜杰.HMM在自然语言处理领域中的应用研究[J].计算机技术与发展,2010,(02):245.
　HAN Pu,JIANG Jie.Application and Research of Hidden Markov Model in Natural Language Processing Domain[J].,2010,(06):245.
[4]胡琼凯黄建华.基于协议分析和决策树的入侵检测研究[J].计算机技术与发展,2009,(06):179.
　HU Oiong-kai,HUANG Jian-hua.Intrusion Detection Based on Protocol Analysis and Decision Tree[J].,2009,(06):179.
[5]胡国晴李建华.一种基于可信度分析的Web页面新属性发现方法[J].计算机技术与发展,2009,(01):56.
　HU Guo-qing,LI Jian-hua.A Credibility Analysis- Based Method to Discover New Attributes Web Pages[J].,2009,(06):56.
[6]李宏伟史培中张素智.一种高效Web数据抽取包装器的设计与实现[J].计算机技术与发展,2009,(02):123.
　LI Hong-wei,SHI Pei-zhong,ZHANG Su-zhi.Design and Implementation of an Efficient Wrapper for Web Data Extraction[J].,2009,(06):123.
[7]赵金仿赵艳缪建明.网页信息抽取及其自动文本分类的实现[J].计算机技术与发展,2008,(10):37.
　ZHAO Jin-fang,ZHAO Yan,MIAO Jian-ming.Extraction of Homepage Text Information and Realization of Text Automatic Categorization[J].,2008,(06):37.
[8]崔阳吴爱华.一种面向B2B垂直搜索的网页信息去噪方法[J].计算机技术与发展,2008,(12):70.
　CUI Yang,WU Ai-hua.A Method of Eliminating Noisy Information in Web Pages Oriented B2B Vertical Searching[J].,2008,(06):70.
[9]徐慧杨学兵.基于本体相似度的中文科研论文信息抽取[J].计算机技术与发展,2008,(12):203.
　XU Hui,YANG Xue-bing.Information Extraction from Chinese Research Papers Based on Ontology Similarity[J].,2008,(06):203.
[10]仲华崔志明.基于XML的信息抽取和多层向量空间技术研究[J].计算机技术与发展,2007,(07):49.
　ZHONG Hua,CUI Zhi-ming.Research on Information Extraction and Multilayer Vector Space Based on XML Technology[J].,2007,(06):49.
[11]成卫青,于静,杨晶,等.基于页面分类的 Web 信息抽取方法研究[J].计算机技术与发展,2013,(01):54.
　CHENG Wei-qing,YU Jing,YANG Jing,et al.Web Information Extraction Research Based on Page Classification[J].,2013,(06):54.

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed1120
全文下载/Downloads690
评论/Comments