[1]范恒亮,成卫青.一种基于关联分析的KNN文本分类方法[J].计算机技术与发展,2014,24(06):71-74.
 FAN Heng-liang,CHENG Wei-qing.An Improved KNN Approach of Text Classification Based on Association Analysis[J].,2014,24(06):71-74.
点击复制

一种基于关联分析的KNN文本分类方法()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
24
期数:
2014年06期
页码:
71-74
栏目:
智能、算法、系统工程
出版日期:
2014-06-30

文章信息/Info

Title:
An Improved KNN Approach of Text Classification Based on Association Analysis
文章编号:
1673-629X(2014)06-0071-04
作者:
范恒亮成卫青
南京邮电大学 计算机学院
Author(s):
FAN Heng-liangCHENG Wei-qing
关键词:
数据挖掘文本分类KNN关联分析
Keywords:
data miningtext classificationKNNassociation analysis
分类号:
TP301
文献标志码:
A
摘要:
KNN算法在数据挖掘的分支-文本分类中有重要的应用。在分析了传统KNN方法不足的基础上,提出了一种基于关联分析的KNN改进算法。该方法首先针对不同类别的训练文本提取每个类别的频繁特征集及其关联的文本,然后基于对各个类别文本的关联分析结果,为未知类别文本确定适当的近邻数k,并在已知类别的训练文本中快速选取k个近邻,进而根据近邻的类别确定未知文本的类别。相比于基于传统KNN的文本分类方法,改进方法能够较好地确定k值,并能降低时间复杂度。实验结果表明,文中提出的基于改进KNN的文本分类方法提高了文本分类的效率和准确率。
Abstract:
The KNN algorithm is largely applied in text classification,one branch of data mining. On the basis of analyzing the deficien-cies of the traditional KNN method,an improved KNN algorithm based on association analysis is proposed in this paper. In this method, frequent feature sets for each class of training documents and associated documents should be extracted in advance. When a document with unknown class is to be classified,by the use of the results of association analysis,the number of nearest neighbors,k can be decided,k nearest neighbors can be found quickly from all classes of training documents,and the class of the document can be decided by the classes of its neighbors. Compared with the traditional KNN algorithm,this method has greatly improved in the selection of the best number of nearest neighbors. Moreover,it can also reduce the time complexity of the algorithm. The experimental results show that the proposed al-gorithm has better efficiency and accuracy in text classification.

相似文献/References:

[1]田昕辉 李成基.带有短语切分的中文文本分类方法[J].计算机技术与发展,2010,(01):5.
 TIAN Xin-hui,LEE Sung-kee.Phrase Segmentation for Chinese Text Classification[J].,2010,(06):5.
[2]项响琴 汪彩梅.基于聚类高维空间算法的离群数据挖掘技术研究[J].计算机技术与发展,2010,(01):120.
 XIANG Xiang-qin,WANG Cai-mei.Study of Outlier Data Mining Based on CLIQUE Algorithm[J].,2010,(06):120.
[3]李雷 丁亚丽 罗红旗.基于规则约束制导的入侵检测研究[J].计算机技术与发展,2010,(03):143.
 LI Lei,DING Ya-li,LUO Hong-qi.Intrusion Detection Technology Research Based on Homing - Constraint Rule[J].,2010,(06):143.
[4]吉同路 柏永飞 王立松.住宅与房地产电子政务中数据挖掘的应用研究[J].计算机技术与发展,2010,(01):235.
 JI Tong-lu,BAI Yong-fei,WANG Li-song.Study and Application of Data Mining in E-government of House and Real Estate Industry[J].,2010,(06):235.
[5]杨静 张楠男 李建 刘延明 梁美红.决策树算法的研究与应用[J].计算机技术与发展,2010,(02):114.
 YANG Jing,ZHANG Nan-nan,LI Jian,et al.Research and Application of Decision Tree Algorithm[J].,2010,(06):114.
[6]赵裕啸 倪志伟 王园园 伍章俊.SQL Server 2005数据挖掘技术在证券客户忠诚度的应用[J].计算机技术与发展,2010,(02):229.
 ZHAO Yu-xiao,NI Zhi-wei,WANG Yuan-yuan,et al.Application of Data Mining Technology of SQL Server 2005 in Customer Loyalty Model in Securities Industry[J].,2010,(06):229.
[7]姜鹤 陈丽亚.SVM文本分类中一种新的特征提取方法[J].计算机技术与发展,2010,(03):17.
 JIANG He,CHEN Li-ya.A New Feature Selection Method in SVM Text Categorization[J].,2010,(06):17.
[8]张笑达 徐立臻.一种改进的基于矩阵的频繁项集挖掘算法[J].计算机技术与发展,2010,(04):93.
 ZHANG Xiao-da,XU Li-zhen.An Advanced Frequent Itemsets Mining Algorithm Based on Matrix[J].,2010,(06):93.
[9]王爱平 王占凤 陶嗣干 燕飞飞.数据挖掘中常用关联规则挖掘算法[J].计算机技术与发展,2010,(04):105.
 WANG Ai-ping,WANG Zhan-feng,TAO Si-gan,et al.Common Algorithms of Association Rules Mining in Data Mining[J].,2010,(06):105.
[10]周瑛 张铃.有限混合模型在文本分类中的应用研究[J].计算机技术与发展,2010,(06):18.
 ZHOU Ying,ZHANG Ling.Study of Application of Finite Mixture Model in Text Classification[J].,2010,(06):18.
[11]黎洁仪,梁之彦,范绍佳,等.线上降雨灾情检测系统设计与应用[J].计算机技术与发展,2022,32(08):191.[doi:10. 3969 / j. issn. 1673-629X. 2022. 08. 031]
 LI Jie-yi,LIANG Zhi-yan,FAN Shao-jia,et al.Design and Application of Online Rainfall Disaster Detection System[J].,2022,32(06):191.[doi:10. 3969 / j. issn. 1673-629X. 2022. 08. 031]

更新日期/Last Update: 1900-01-01