[1]龚静,胡平霞,胡灿. 用于文本分类的特征项权重算法改进[J].计算机技术与发展,2014,24(09):128-132.
 GONG Jing,HU Ping-xia,HU Can. Improvement of Algorithm for Weight of Characteristic Item in Text Classification [J].,2014,24(09):128-132.
点击复制

 用于文本分类的特征项权重算法改进()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
24
期数:
2014年09期
页码:
128-132
栏目:
智能、算法、系统工程
出版日期:
2014-09-10

文章信息/Info

Title:
 Improvement of Algorithm for Weight of Characteristic Item in Text Classification

文章编号:
1673-629X(2014)09-0128-05
作者:
 龚静胡平霞胡灿
 湖南环境生物职业技术学院 信息技术系
Author(s):
 GONG JingHU Ping-xiaHU Can
关键词:
 文本分类特征项权重改进
Keywords:
 text classificationfeature itemweightsimprovement
分类号:
TP301
文献标志码:
A
摘要:
 TF-IDF算法是文本分类中一种常用的权重计算方法,但是TF-IDF仅仅考虑了特征项在文本中出现的次数以及该特征项在训练集中的出现频率,没有考虑特征项在各个类间的分布情况及特征项的语义信息。因此针对TF-IDF的不足提出了一种改进的TF-IDF算法,此算法既考虑了特征项在类内的分布情况又考虑了特征项的位置及长度等语义因素,能更好地反映特征项的重要性。用朴素贝叶斯分类器验证其有效性,实验结果表明该算法优于TF-IDF算法,能较好地提高文本分类的准确率。
Abstract:
 TF-IDF algorithm is a commonly used method of calculating weight in text classification,but TF-IDF considers only occurrence of feature in the text,as well as the frequency of characteristic appearing in the training set,and does not take into the distribution of charac-teristics in each class and the semantic information of characteristics account. In order to solve this problem,the improved TF-IDF algo-rithm has been proposed which considers not only the distribution condition of feature in class,but also the semantic factors such as the po-sition of the feature,length of the feature. This algorithm can better reflect the importance of feature item,and its validity is verified by Na?ve Bayes classifier. The experiment results show that the proposed algorithm outperforms the TF-IDF algorithm,and the algorithm can improve the accuracy of text classification well.

相似文献/References:

[1]田昕辉 李成基.带有短语切分的中文文本分类方法[J].计算机技术与发展,2010,(01):5.
 TIAN Xin-hui,LEE Sung-kee.Phrase Segmentation for Chinese Text Classification[J].,2010,(09):5.
[2]姜鹤 陈丽亚.SVM文本分类中一种新的特征提取方法[J].计算机技术与发展,2010,(03):17.
 JIANG He,CHEN Li-ya.A New Feature Selection Method in SVM Text Categorization[J].,2010,(09):17.
[3]周瑛 张铃.有限混合模型在文本分类中的应用研究[J].计算机技术与发展,2010,(06):18.
 ZHOU Ying,ZHANG Ling.Study of Application of Finite Mixture Model in Text Classification[J].,2010,(09):18.
[4]许幸 张启蕊.基于KNN算法的医药信息文本分类系统的研究[J].计算机技术与发展,2009,(04):206.
 XU Xing,ZHANG Qi-rui.Research of Medical Information Text Categorization Based on KNN Algorithm[J].,2009,(09):206.
[5]陈锦禾 范新 沈闻 沈洁.基于情感词识别的BBS情感分类研究[J].计算机技术与发展,2009,(07):120.
 CHEN Jin-he,FAN Xin,SHEN Wen,et al.Research on Sentiment Classification of BBS Reviews Based on Identifying Words with Polarity[J].,2009,(09):120.
[6]刘锋 唐佳 仲红.一种基于RBF神经网络的XML文本分类方法[J].计算机技术与发展,2009,(08):34.
 LIU Feng,TANG Jia,ZHONG Hong.A Text Categorization Method Based on RBF Neural Network[J].,2009,(09):34.
[7]晋幼丽 周明全 王学松.SVM和K-means结合的文本分类方法研究[J].计算机技术与发展,2009,(11):35.
 JIN You-li,ZHOU Ming-quan,WANG Xue-song.Research on Text Classification Method of SVM and K - means[J].,2009,(09):35.
[8]张燕平 徐庆鹏 苏守宝 邢猛.一种基于贪婪覆盖的文本分类方法[J].计算机技术与发展,2009,(01):74.
 ZHANG Yan-ping,XU Qing-peng,SU Shou-bao,et al.A Text Categorization Method Based on Greedy Cover[J].,2009,(09):74.
[9]陈素萍 谢丽聪.一种文本特征选择方法的研究[J].计算机技术与发展,2009,(02):112.
 CHEN Su-ping,XIE Li-cong.Research on Document Feature Selection[J].,2009,(09):112.
[10]于水英 丁华福 付志超.基于遗传算法和模糊聚类的文本分类研究[J].计算机技术与发展,2009,(04):131.
 YU Shui-ying,DING Hua-fu,FU Zhi-chao.Study on Text Categorization Based on Genetic Algorithm and Fuzzy Clustering[J].,2009,(09):131.
[11]李妍坊,许歆艺,刘功申. 面向情感倾向性识别的特征分析研究[J].计算机技术与发展,2014,24(09):33.
 LI Yan-fang,XU Xin-yi,LIU Gong-shen. Research on Feature Analysis Oriented Text Sentiment Identification[J].,2014,24(09):33.
[12]李琼,陈利. 一种改进的支持向量机文本分类方法[J].计算机技术与发展,2015,25(05):78.
 LI Qiong CHEN Li. An Improved Text Classification Method for Support Vector Machine[J].,2015,25(09):78.
[13]裴向杰,唐红昇,陈鹏. 一种改进的贝叶斯算法在短信过滤中的研究[J].计算机技术与发展,2015,25(09):89.
 PEI Xiang-jie,TANG Hong-sheng,CHEN Peng. Research on Optimized Naive Bayesian Algorithm in SMS Spam Filtering[J].,2015,25(09):89.
[14]卫华,韩立新,夏建华. 基于Word2 fea模型的文本建模方法[J].计算机技术与发展,2016,26(02):165.
 WEI Hua,HAN Li-xin,XIA Jian-hua. Text Modeling Method Based on Word2 fea Model[J].,2016,26(09):165.
[15]戚后林,顾磊. 概率潜在语义分析的KNN文本分类算法[J].计算机技术与发展,2017,27(07):57.
 QI Hou-lin,GU Lei. KNN Text Classification Algorithm with Probabilistic Latent Semantic Analysis[J].,2017,27(09):57.

更新日期/Last Update: 2015-04-01