[1]刘耀杰,刘独玉.基于不平衡数据集的改进随机森林算法研究[J].计算机技术与发展,2019,29(06):100-104.[doi:10. 3969 / j. issn. 1673-629X. 2019. 06. 021]
 LIU Yao-jie,LIU Du-yu.Research on Improved Random Forest Algorithm Based on Unbalanced Datasets[J].,2019,29(06):100-104.[doi:10. 3969 / j. issn. 1673-629X. 2019. 06. 021]
点击复制

基于不平衡数据集的改进随机森林算法研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
29
期数:
2019年06期
页码:
100-104
栏目:
智能、算法、系统工程
出版日期:
2019-06-10

文章信息/Info

Title:
Research on Improved Random Forest Algorithm Based on Unbalanced Datasets
文章编号:
1673-629X(2019)06-0100-05
作者:
刘耀杰刘独玉
西南民族大学 电气信息工程学院,四川 成都 610041
Author(s):
LIU Yao-jieLIU Du-yu
School of Electrical and Information Engineering,Southwest Minzu University, Chengdu 610041,China
关键词:
不平衡数据集随机森林决策树节点分裂分类准确率
Keywords:
imbalance datarandom forestdecision treenode splitclassification accuracy
分类号:
TP301.6
DOI:
10. 3969 / j. issn. 1673-629X. 2019. 06. 021
摘要:
随机森林算法在多种应用场景与数据集中都实现了良好的模型分类效果,但该算法在应用于不平衡二分类数据集时,受限于样本数据量本身的好坏比倾斜与决策子树叶节点投票机制,对样本量占相对少数的小类属样本不能很好地对分类进行表决。 对此,文中对原有随机森林算法的节点分类规则进行改进。 在模型训练过程中,综合考虑度量节点样本分类占比与节点深度,增加有利于少量类样本分类信息,从而提高了少数样本类的分类准确率。 通过在不同数据集上进行随机森林改进算法的效果测试,证明改进算法相对于传统算法在不平衡数据集上有更好的模型表现,大样本条件下少量类样本分类准确率有显著提升。
Abstract:
Random forest algorithm has achieved a great classification effect in a variety of scenarios and datasets,but when applied in the unbalanced binary classification datasets,it is restricted to the imbalance of sample data itself and the leaf node voting mechanism,the sample which size of relatively few samples can爷t vote on classification very well. For this,we improve the node classification rules of original random forest algorithm. In model training,by considering sample classification proportion and the depth of the measurement nodes comprehensively,and increasing classified information in favor for the small amount of samples,the accuracy of the few sample classification can be raised. After testing on different datasets,it proves that the improved algorithm on unbalanced dataset has better performance than the traditional algorithm,and that the few sample classification accuracy has been increased significantly under the condition of large amount of dataset.

相似文献/References:

[1]吴敏,张化朋,李雷.欠抽样和DEC相结合的不平衡数据分类算法[J].计算机技术与发展,2014,24(04):110.
 U Min,ZHANG Hua-peng,LI Lei.Classification Algorithm for Imbalanced Datasets Combined Undersampling with DEC[J].,2014,24(06):110.
[2]刘凌,郭剑,韩崇. 面向不平衡数据的模糊支持向量机[J].计算机技术与发展,2015,25(11):38.
  Fuzzy Support Vector Machine for Imbalanced Data[J].,2015,25(06):38.
[3]张丹丹,李雷. 基于PCANet-RF的人脸检测系统[J].计算机技术与发展,2016,26(02):31.
 ZHANG Dan-dan,LI Lei. Face Detection System Based on PCANet-RF[J].,2016,26(06):31.
[4]刘广东,邱晓晖.基于多模式LBP 与深度森林的指静脉识别[J].计算机技术与发展,2018,28(07):83.[doi:10.3969/ j. issn.1673-629X.2018.07.018]
 LIU Guang-dong,QIU Xiao-hui.Finger Vein Recognition Based on Multi-mode LBP and Deep Forest[J].,2018,28(06):83.[doi:10.3969/ j. issn.1673-629X.2018.07.018]
[5]张鑫,吴海涛,曹雪虹.Hadoop 环境下基于随机森林的特征选择算法[J].计算机技术与发展,2018,28(07):88.[doi:10.3969/ j. issn.1673-629X.2018.07.019]
 ZHANG Xin,WU Hai-tao,CAO Xue-hong.A Feature Selection Algorithm Based on Random Forest in Hadoop Platform[J].,2018,28(06):88.[doi:10.3969/ j. issn.1673-629X.2018.07.019]
[6]刘凯,郑山红,蒋权,等.基于随机森林的自适应特征选择算法[J].计算机技术与发展,2018,28(09):101.[doi:10.3969/j.issn.1673-629X.2018.09.021]
 LIU Kai,ZHENG Shanhong,JIANG Quan,et al.A Self-adaptive Feature Selection Algorithm Based on Random Forest[J].,2018,28(06):101.[doi:10.3969/j.issn.1673-629X.2018.09.021]
[7]陆兵,顾苏杭.基于级联特征的随机森林运动目标跟踪算法[J].计算机技术与发展,2019,29(05):86.[doi:10. 3969 / j. issn. 1673-629X. 2019. 05. 019]
 LU Bing,GU Su-hang.A Moving Object Tracking Algorithm of Random Forest Based on Features Cascade[J].,2019,29(06):86.[doi:10. 3969 / j. issn. 1673-629X. 2019. 05. 019]
[8]苗立志,刁继尧,娄 冲,等.基于 Spark 和随机森林的乳腺癌风险预测分析[J].计算机技术与发展,2019,29(08):142.[doi:10. 3969 / j. issn. 1673-629X. 2019. 08. 027]
 MIAO Li-zhi,DIAO Ji-yao,LOU Chong,et al.Breast Cancer Risk Prediction Analysis Based on Apache Spark and Random Forest Algorithm[J].,2019,29(06):142.[doi:10. 3969 / j. issn. 1673-629X. 2019. 08. 027]
[9]于 澍,曹 琦,刘 涛.基于随机森林的微博互动特征分析[J].计算机技术与发展,2019,29(10):51.[doi:10. 3969 / j. issn. 1673-629X. 2019. 10. 011]
 YU Shu,CAO Qi,LIU Tao.Analysis of Interactive Characteristics of Weibo Based on Random Forest[J].,2019,29(06):51.[doi:10. 3969 / j. issn. 1673-629X. 2019. 10. 011]
[10]姬晓飞,石宇辰.多分类器融合的光学遥感图像目标识别算法[J].计算机技术与发展,2019,29(11):52.[doi:10. 3969 / j. issn. 1673-629X. 2019. 11. 011]
 JI Xiao-fei,SHI Yu-chen.Optical Remote Sensing Image Object Recognition Based on Multiple Classifications Fusion[J].,2019,29(06):52.[doi:10. 3969 / j. issn. 1673-629X. 2019. 11. 011]
[11]陈斌,苏一丹,黄山. 基于KM-SMOTE和随机森林的不平衡数据分类[J].计算机技术与发展,2015,25(09):17.
 CHEN Bin,SU Yi-dan,HUANG Shan. Classification of Imbalance Data Based on KM-SMOTE Algorithm and Random Forest[J].,2015,25(06):17.

更新日期/Last Update: 2019-06-10