[1]孙龙,李彦.基于功能结构元组的技术文档的特征提取研究[J].计算机技术与发展,2019,29(05):12-16.[doi:10. 3969 / j. issn. 1673-629X. 2019. 05. 003]
 SUN Long,LI Yan.Research on Feature Extraction of Technology Document Based on Functional Structure Tuple[J].,2019,29(05):12-16.[doi:10. 3969 / j. issn. 1673-629X. 2019. 05. 003]
点击复制

基于功能结构元组的技术文档的特征提取研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
29
期数:
2019年05期
页码:
12-16
栏目:
智能、算法、系统工程
出版日期:
2019-05-10

文章信息/Info

Title:
Research on Feature Extraction of Technology Document Based on Functional Structure Tuple
文章编号:
1673-629X(2019)05-0012-05
作者:
孙龙1李彦2
1. 四川大学 计算机学院,四川 成都 610065;2. 四川大学 制造学院,四川 成都 610065
Author(s):
SUN Long1LI Yan2
1. School of Computer Science,Sichuan University,Chengdu 610065,China;2. School of Manufacturing Science and Engineering,Sichuan University,Chengdu 610065,China
关键词:
功能元组文本分类特征抽取算法设计
Keywords:
function tupletext classificationfeature extractionalgorithm design
分类号:
TP393
DOI:
10. 3969 / j. issn. 1673-629X. 2019. 05. 003
摘要:
词汇模型在表征工程技术知识文档特征时,将文档切分为一个个相互不关联的词,较难提取出文档的语义特征。如果考虑到一条语句中词之间的关联性,根据工程技术知识文档的语义结构信息,提取出功能结构元组作为文档特征,则可以进一步提高分类效果。 文中从工程技术知识文档的特点出发,在归纳总结现有提取文本功能结构元组方法的基础上,探索将基于统计的方法和基于规则的方法相结合。 首先通过统计方法提取文档特征词来过滤掉文档中的噪声和无意义的语句,再从过滤后的语句中按照语法分析树的层次,以递归方式提取文档中的功能结构元组。 为更有效地提取语句中的功能结构元组,对语句的词法分析树中规律性的规则进行了总结。 经验证,该方法可有效提升工程技术知识文档的特征提取效果。
Abstract:
When the vocabulary model represents the features of engineering technology documents,the documents are divided into words that are not related to each other,which is difficult to extract the semantic features of the document. If the relationship between words in asentence is taken into consideration,the functional structure tuples,which extracted as feature of the engineering technology documents,can further improve the classification effect. Starting with the characteristics of engineering technology documents, on the basis ofsummarizing the existing methods of extracting text functional structure tuples,we explore the combination of statistical method and rulebased method. Firstly the document feature words extracted by statistical method contribute to filter out the noise and meaninglessstatements in the document. Then,the functional tuples is recursively extracted according to the level of the parse tree. In order toimprove the effect of the extraction,we summarize the regularity rules of the lexical analysis tree. After verification,this method can effectively improve the feature extraction effect of engineering technology documents.

相似文献/References:

[1]田昕辉 李成基.带有短语切分的中文文本分类方法[J].计算机技术与发展,2010,(01):5.
 TIAN Xin-hui,LEE Sung-kee.Phrase Segmentation for Chinese Text Classification[J].,2010,(05):5.
[2]姜鹤 陈丽亚.SVM文本分类中一种新的特征提取方法[J].计算机技术与发展,2010,(03):17.
 JIANG He,CHEN Li-ya.A New Feature Selection Method in SVM Text Categorization[J].,2010,(05):17.
[3]周瑛 张铃.有限混合模型在文本分类中的应用研究[J].计算机技术与发展,2010,(06):18.
 ZHOU Ying,ZHANG Ling.Study of Application of Finite Mixture Model in Text Classification[J].,2010,(05):18.
[4]许幸 张启蕊.基于KNN算法的医药信息文本分类系统的研究[J].计算机技术与发展,2009,(04):206.
 XU Xing,ZHANG Qi-rui.Research of Medical Information Text Categorization Based on KNN Algorithm[J].,2009,(05):206.
[5]陈锦禾 范新 沈闻 沈洁.基于情感词识别的BBS情感分类研究[J].计算机技术与发展,2009,(07):120.
 CHEN Jin-he,FAN Xin,SHEN Wen,et al.Research on Sentiment Classification of BBS Reviews Based on Identifying Words with Polarity[J].,2009,(05):120.
[6]刘锋 唐佳 仲红.一种基于RBF神经网络的XML文本分类方法[J].计算机技术与发展,2009,(08):34.
 LIU Feng,TANG Jia,ZHONG Hong.A Text Categorization Method Based on RBF Neural Network[J].,2009,(05):34.
[7]晋幼丽 周明全 王学松.SVM和K-means结合的文本分类方法研究[J].计算机技术与发展,2009,(11):35.
 JIN You-li,ZHOU Ming-quan,WANG Xue-song.Research on Text Classification Method of SVM and K - means[J].,2009,(05):35.
[8]张燕平 徐庆鹏 苏守宝 邢猛.一种基于贪婪覆盖的文本分类方法[J].计算机技术与发展,2009,(01):74.
 ZHANG Yan-ping,XU Qing-peng,SU Shou-bao,et al.A Text Categorization Method Based on Greedy Cover[J].,2009,(05):74.
[9]陈素萍 谢丽聪.一种文本特征选择方法的研究[J].计算机技术与发展,2009,(02):112.
 CHEN Su-ping,XIE Li-cong.Research on Document Feature Selection[J].,2009,(05):112.
[10]于水英 丁华福 付志超.基于遗传算法和模糊聚类的文本分类研究[J].计算机技术与发展,2009,(04):131.
 YU Shui-ying,DING Hua-fu,FU Zhi-chao.Study on Text Categorization Based on Genetic Algorithm and Fuzzy Clustering[J].,2009,(05):131.

更新日期/Last Update: 2019-05-10