[1]董梅 胡学钢.基于多特征选择的中文文本分类[J].计算机技术与发展,2007,(07):117-119.
 DONG Mei,HU Xue-gang.Text Categorization Based on Multiple Features Selection[J].,2007,(07):117-119.
点击复制

基于多特征选择的中文文本分类()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2007年07期
页码:
117-119
栏目:
智能、算法、系统工程
出版日期:
1900-01-01

文章信息/Info

Title:
Text Categorization Based on Multiple Features Selection
文章编号:
1673-629X(2007)07-0117-03
作者:
董梅 胡学钢
合肥工业大学计算机与信息学院
Author(s):
DONG Mei HU Xue-gang
School of Computer & Information, Hefei University of Technology
关键词:
文本分类特征选择多特征选择
Keywords:
text categorization feature selection multiple features selection
分类号:
TP18
文献标志码:
A
摘要:
自动文本分类就是在给定的分类体系下,让计算机根据文本的内容确定与它相关联的类别。特征选择作为文本分类中的关键,困难之一是特征空间的高维性,因此寻求一种有效的特征选择方法,降低特征空间的维数,成为文本分类中的重要问题。在分析已有的文本分类特征选择方法的基础上,实现了一种组合不同特征选择方法的多特征选择方法,应用于KNN文本分类算法,实验表明,多特征选择方法分类效果比单一的特征选择方法分类效果有明显的提高
Abstract:
Automatic text categorization is the assigning of pre- defined category to a new text based on its content. Feature selection is the key of text categorization. Feature space' s hight dimension is one of diffculties of it. So to find an effective feature selection method and to reduce feature space's dimension has become the important problem of text categorization. Based on analyzing most known text categorization's feature selection methods and a new multiple feature selection method that combined different feature selection methods was given. Experiments were done using KNN algorithm. The results show tb, at the new multiple features selection method had better efficiency than single feature selection method

相似文献/References:

[1]田昕辉 李成基.带有短语切分的中文文本分类方法[J].计算机技术与发展,2010,(01):5.
 TIAN Xin-hui,LEE Sung-kee.Phrase Segmentation for Chinese Text Classification[J].,2010,(07):5.
[2]刘利 何先平 袁文亮.股票趋势预测中Wrapper方法的研究与应用[J].计算机技术与发展,2010,(01):209.
 LIU Li,HE Xian-ping,YUAN Wen-liang.Research and Application of Wrapper Approach to Stock Trend Prediction[J].,2010,(07):209.
[3]姜鹤 陈丽亚.SVM文本分类中一种新的特征提取方法[J].计算机技术与发展,2010,(03):17.
 JIANG He,CHEN Li-ya.A New Feature Selection Method in SVM Text Categorization[J].,2010,(07):17.
[4]周瑛 张铃.有限混合模型在文本分类中的应用研究[J].计算机技术与发展,2010,(06):18.
 ZHOU Ying,ZHANG Ling.Study of Application of Finite Mixture Model in Text Classification[J].,2010,(07):18.
[5]黄炜 黄志华.一种基于遗传算法和SVM的特征选择[J].计算机技术与发展,2010,(06):21.
 HUANG Wei,HUANG Zhi-hua.Feature Selection Based on Genetic Algorithm and SVM[J].,2010,(07):21.
[6]张家柏 王小玲.基于聚类和二进制PSO的特征选择[J].计算机技术与发展,2010,(06):25.
 ZHANG Jia-bai,WANG Xiao-ling.A Novel Algorithm Based on K-Means Clustering and Binary Particle Swarm Optimization[J].,2010,(07):25.
[7]许幸 张启蕊.基于KNN算法的医药信息文本分类系统的研究[J].计算机技术与发展,2009,(04):206.
 XU Xing,ZHANG Qi-rui.Research of Medical Information Text Categorization Based on KNN Algorithm[J].,2009,(07):206.
[8]陈锦禾 范新 沈闻 沈洁.基于情感词识别的BBS情感分类研究[J].计算机技术与发展,2009,(07):120.
 CHEN Jin-he,FAN Xin,SHEN Wen,et al.Research on Sentiment Classification of BBS Reviews Based on Identifying Words with Polarity[J].,2009,(07):120.
[9]刘锋 唐佳 仲红.一种基于RBF神经网络的XML文本分类方法[J].计算机技术与发展,2009,(08):34.
 LIU Feng,TANG Jia,ZHONG Hong.A Text Categorization Method Based on RBF Neural Network[J].,2009,(07):34.
[10]冯甲策 叶明 王惠文.基于Gram—Schmidt过程的支持向量机降维方法[J].计算机技术与发展,2009,(11):7.
 FENG Jia-ce,YE Ming,WANG Hui-wen.Dimension Reduction Method of Support Vector Machine Based on Gram- Schmidt Process[J].,2009,(07):7.
[11]陈素萍 谢丽聪.一种文本特征选择方法的研究[J].计算机技术与发展,2009,(02):112.
 CHEN Su-ping,XIE Li-cong.Research on Document Feature Selection[J].,2009,(07):112.
[12]段震 王倩倩 张燕平 张铃.覆盖算法下文本分类特征选择的研究[J].计算机技术与发展,2008,(11):29.
 DUAN Zhen,WANG Qian-qian,ZHANG Yan-ping,et al.Study on Feature Selection of Text Classification in Cross Cover Algorithm[J].,2008,(07):29.
[13]张丽 黄东.基于Winnow算法的反垃圾邮件引擎的设计与实现[J].计算机技术与发展,2006,(04):170.
 ZHANG Li,HUANG Dong.Design and Implementation of One Prototype of Anti - Spam Engine Based on Winnow Algorithm[J].,2006,(07):170.
[14]张培颖,王雷全.基于语义距离的文本分类方法[J].计算机技术与发展,2013,(01):128.
 ZHANG Peo-ying,WANG Lei-quan.Text Classification Method Based on Semantic Distance[J].,2013,(07):128.
[15]王振,邱晓晖.混合CHI 和MI 的改进文本特征选择方法[J].计算机技术与发展,2018,28(04):87.[doi:10.3969/ j. issn.1673-629X.2018.04.018]
 WANG Zhen,QIU Xiao-hui.An Improved Text Feature Selection Method Mixed CHI and MI[J].,2018,28(07):87.[doi:10.3969/ j. issn.1673-629X.2018.04.018]
[16]陈春玲*,姜慧敏,郭永安.基于两阶段特征选择的医疗敏感文本分类[J].计算机技术与发展,2020,30(08):129.[doi:10. 3969 / j. issn. 1673-629X. 2020. 08. 022]
 CHEN Chun-ling*,JIANG Hui-min,GUO Yong-an.Medical Sensitive Text Classification Based on Two-stage Feature Selection[J].,2020,30(07):129.[doi:10. 3969 / j. issn. 1673-629X. 2020. 08. 022]

备注/Memo

备注/Memo:
安徽省自然科学基金资助项目(050420207)董梅(1977-),女,河北保定人,硕士研究生,研究方向为数据挖掘;胡学钢,教授,博士,硕士生导师,研究方向为人工智能、数据挖掘、数据结构
更新日期/Last Update: 1900-01-01