[1]李雪驹,王智广,鲁强. 一种规则与SVM结合的论文抽取方法[J].计算机技术与发展,2017,27(10):24-29.
 LI Xue-ju,WANG Zhi-guang,LU Qiang. An Extraction Method for Papers via Integration of Rules with SVM[J].,2017,27(10):24-29.
点击复制

 一种规则与SVM结合的论文抽取方法()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
27
期数:
2017年10期
页码:
24-29
栏目:
智能、算法、系统工程
出版日期:
2017-10-10

文章信息/Info

Title:
 An Extraction Method for Papers via Integration of Rules with SVM
文章编号:
1673-629X(2017)10-0024-06
作者:
李雪驹王智广鲁强
 中国石油大学(北京) 地球物理与信息工程学院
Author(s):
 LI Xue-juWANG Zhi-guangLU Qiang
关键词:
 PDF论文规则支持向量机样本特征混合方法信息抽取
Keywords:
 PDF papersrulessupport vector machinesample characteristicshybrid methodinformation extraction
分类号:
TP301
文献标志码:
A
摘要:
 
传统PDF论文抽取方法主要是单独基于规则的方法或单独基于机器学习的方法,其中基于规则的抽取方法在处理格式固定的数据方面具有明显的优势,通过制定简单的抽取规则即可准确定位并抽取数据;而在处理格式灵活的数据时,则需要制定相当复杂的规则,且不具备对论文格式的适应性,因而明显缺乏机器学习抽取方法的灵活性和准确性.为此,提出了一种基于规则与SVM相结合的PDF论文抽取方法.该方法充分利用规则方法与机器学习在信息抽取时的优点,在用简单的规则抽取格式固定的信息的基础上,选取样本特征构建训练集,并选择最优的核函数生成SVM模型,从而完成基于SVM方法的信息抽取.以SVM的抽取结果为主体,通过合理利用基于规则抽取的结果并制定适当的规则的方式对该方法进行验证.实验结果表明,该方法在论文元数据和章节标题等信息抽取方面具有较好的效果.
Abstract:
 Traditional extraction methods for PDF format papers are mainly based on either rules or machine learning. The extraction method based on rules has obvious advantages in processing fixed format data,which can accurately locate and extract data by making some simple rules of extraction. However it needs fairly complex rules to deal with flexible data and is lack of the adaptability of paper format,which cannot do better than the extraction method of machine learning in terms of flexibility and accuracy. For this,an extraction method for PDF papers via integration of rules with SVM is proposed which makes full use of the advantages of rules and machine learn-ing when extracting information. On the basis of extracting fixed format information via simple rules,the sample characteristics is chosen to build the training set and the optimal kernel function is selected to generate the SVM model for implementation of information extrac-tion based on SVM. By taken extraction results of the SVM as the main body,the verification experiments is conducted based on rules ra-tionally and some appropriate rules made. The experiment results show that it can achieve better results for extracting metadata and chapter headings of PDF papers.

相似文献/References:

[1]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(10):1.
[2]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(10):5.
[3]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(10):13.
[4]侯善江[],张代远[][][]. 基于样条权函数神经网络P2P流量识别方法[J].计算机技术与发展,2014,24(07):21.
 HOU Shan-jiang[],ZHANG Dai-yuan[][][]. P2P Traffic Identification Based on Spline Weight Function Neural Network[J].,2014,24(10):21.
[5]李璨,耿国华,李康,等. 一种基于三维模型的文物碎片线图生成方法[J].计算机技术与发展,2014,24(07):25.
 LI Can,GENG Guo-hua,LI Kang,et al. A Method of Obtaining Cultural Debris’ s Line Chart Based on Three-dimensional Model[J].,2014,24(10):25.
[6]翁鹤,皮德常. 混沌RBF神经网络异常检测算法[J].计算机技术与发展,2014,24(07):29.
 WENG He,PI De-chang. Chaotic RBF Neural Network Anomaly Detection Algorithm[J].,2014,24(10):29.
[7]刘茜[],荆晓远[],李文倩[],等. 基于流形学习的正交稀疏保留投影[J].计算机技术与发展,2014,24(07):34.
 LIU Qian[],JING Xiao-yuan[,LI Wen-qian[],et al. Orthogonal Sparsity Preserving Projections Based on Manifold Learning[J].,2014,24(10):34.
[8]尚福华,李想,巩淼. 基于模糊框架-产生式知识表示及推理研究[J].计算机技术与发展,2014,24(07):38.
 SHANG Fu-hua,LI Xiang,GONG Miao. Research on Knowledge Representation and Inference Based on Fuzzy Framework-production[J].,2014,24(10):38.
[9]叶偲,李良福,肖樟树. 一种去除运动目标重影的图像镶嵌方法研究[J].计算机技术与发展,2014,24(07):43.
 YE Si,LI Liang-fu,XIAO Zhang-shu. Research of an Image Mosaic Method for Removing Ghost of Moving Targets[J].,2014,24(10):43.
[10]余松平[][],蔡志平[],吴建进[],等. GSM-R信令监测选择录音系统设计与实现[J].计算机技术与发展,2014,24(07):47.
 YU Song-ping[][],CAI Zhi-ping[] WU Jian-jin[],GU Feng-zhi[]. Design and Implementation of an Optional Voice Recording System Based on GSM-R Signaling Monitoring[J].,2014,24(10):47.

更新日期/Last Update: 2017-11-23