[1]陈斌,苏一丹,黄山. 基于KM-SMOTE和随机森林的不平衡数据分类[J].计算机技术与发展,2015,25(09):17-21.
 CHEN Bin,SU Yi-dan,HUANG Shan. Classification of Imbalance Data Based on KM-SMOTE Algorithm and Random Forest[J].,2015,25(09):17-21.
点击复制

 基于KM-SMOTE和随机森林的不平衡数据分类()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
25
期数:
2015年09期
页码:
17-21
栏目:
智能、算法、系统工程
出版日期:
2015-09-10

文章信息/Info

Title:
 Classification of Imbalance Data Based on KM-SMOTE Algorithm and Random Forest
文章编号:
1673-629X(2015)09-0017-05
作者:
 陈斌苏一丹黄山
 广西大学 计算机与电子信息学院
Author(s):
 CHEN BinSU Yi-danHUANG Shan
关键词:
 K-meansSMOTE算法随机森林不平衡数据集
Keywords:
 K-meansSMOTE algorithmrandom forestimbalance data set
分类号:
TP301.6
文献标志码:
A
摘要:
 基于SMOTE算法的随机森林能够很好地处理不平衡数据集的分类,是一种通过对数据进行改造以达到良好分类要求的分类器。但SMOTE算法在处理不平衡数据后,可能会导致不平衡数据集分布的整体变化以及模糊正负类边界。这两个缺陷极易导致平衡后的数据与原始数据集有很大差异,从而使分类结果有提高但仍旧不够理想。K-means算法能够有效地聚类,并达到对数据分布的描述。在此基础上,结合K-means算法与SMOTE算法,利用两者优点,文中提出了一种基于K-means的KM-SMOTE算法,有效地解决了上述两个问题。并用于随机森林分类器进行实验,结果表明,改进后的算法分类效果更加明显。
Abstract:
 The random forest based on SMOTE algorithm can be a good deal classification in imbalance data,is a classifier through trans-forming the data to achieve good classification requirements. But after SMOTE algorithm deals with imbalance data,may cause overall changes of the distribution of imbalance data sets,and fuzzy the boundaries of positive class and negative class. Both defects can easily lead to big difference from the balanced data sets and the original data sets after the change,resulting in classification results not satisfacto-ry. The K-means clustering algorithm can effectively cluster and describe the data distribution. On this basis,combined with K-means al-gorithm and SMOTE algorithm,using the advantages of both,present a KM-SMOTE algorithm based on K-means algorithm,successful-ly resolving these two issues. And for random forest classifier make an experiment. The results also demonstrate that the effect of the im-proved classification algorithm is more obvious.

相似文献/References:

[1]范新 沈闻 丁泉勋 沈洁.基于正例和未标文档的半监督分类研究[J].计算机技术与发展,2009,(06):58.
 FAN Xin,SHEN Wen,DING Quan-xun,et al.Research on Semi- Supervised Classification Based on Positive and Unlabeled Text Document[J].,2009,(09):58.
[2]李若鹏 李翔 林祥 李建华.基于DK算法的互联网热点主动发现研究与实现[J].计算机技术与发展,2008,(09):1.
 LI Ruo-peng,LI Xiang,LIN Xiang,et al.Discovering Information Hotspots on Initiative over Internet Based on DK Clustering Algorithm[J].,2008,(09):1.
[3]朱云贺 张春海 张博.基于数据分段的K-means的优化研究[J].计算机技术与发展,2010,(11):130.
 ZHU Yun-he,ZHANG Chun-hai,ZHANG Bo.Optimizing Research on K-means Based on Data Partition[J].,2010,(09):130.
[4]何云 李辉 姚能坚 赵榕生.改进K-means算法实现移动通信行为特征分析[J].计算机技术与发展,2011,(06):63.
 HE Yun,LI Hui,YAO Neng-jian,et al.Application of Improved K-Means Algorithm in Mobile Communication Behavioral Characteristic Analysis[J].,2011,(09):63.
[5]黎银环,张剑.改进的 K-means 算法在入侵检测中的应用[J].计算机技术与发展,2013,(01):165.
 LI Yin-huan,ZHANG Jian.Application of Improved K-means Clustering Algorithm in Intrusion Detection[J].,2013,(09):165.
[6]李四海,满自斌.自适应特征权重的K-means聚类算法[J].计算机技术与发展,2013,(06):98.
 LI Si-hai[],MAN Zi-bin[].K-means Clustering Algorithm Based on Adaptive Feature Weighted[J].,2013,(09):98.
[7]耿永政,陈坚.结合图论的JSEG彩色图像分割算法[J].计算机技术与发展,2014,24(05):15.
 GENG Yong-zheng,CHEN Jian.JSEG Color Image Segmentation Algorithm Combining Graph Theory[J].,2014,24(09):15.
[8]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(09):1.
[9]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(09):5.
[10]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(09):13.
[11]胡磊,蔡红霞,俞涛. 双重聚类的协同过滤算法在智能家居中的应用[J].计算机技术与发展,2017,27(02):100.
 HU Lei,CAI Hong-xia,YU Tao. Application of Collaborative Filtering Recommendation Based on Double Clustering in Smart Home System[J].,2017,27(09):100.
[12]鲍黎明,黄刚. 基于多叉树确定K值的动态K-means聚类算法[J].计算机技术与发展,2017,27(06):41.
 BAO Li-ming,HUANG Gang. A Dynamic Clustering Algorithm of K-means Based onMulti-branches Tree for K-values[J].,2017,27(09):41.
[13]曹耀彬,王亚刚. 免疫算法优化的RBF在入侵检测中的应用[J].计算机技术与发展,2017,27(06):114.
 CAO Yao-bin,WANG Ya-gang. Application of RBF Neural Network Optimized by Immune Algorithm in Intrusion Detection[J].,2017,27(09):114.
[14]李玉波[],杨余旺[],唐浩[],等. 基于Spark的K-means安全区间更新优化算法[J].计算机技术与发展,2017,27(08):1.
 LI Yu-bo[],YANG Yu-wang[],TANG Hao[],et al. Optimization of K-means Updating Security Interval Based on Spark[J].,2017,27(09):1.
[15]万新贵,李玲娟. 基于结构与属性的社区划分方法[J].计算机技术与发展,2017,27(08):97.
 WAN Xin-gui,LI Ling-juan. Community Division Method with Structure and Attribute[J].,2017,27(09):97.

更新日期/Last Update: 2015-10-16