[1]王 诚,高 蕊.基于特征约简的随机森林改进算法研究[J].计算机技术与发展,2020,30(03):40-45.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 008]
 WANG Cheng,GAO Rui.An Improved Random Forest Algorithm Based on Feature Reduction[J].Computer Technology and Development,2020,30(03):40-45.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 008]
点击复制

基于特征约简的随机森林改进算法研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
30
期数:
2020年03期
页码:
40-45
栏目:
智能、算法、系统工程
出版日期:
2020-03-10

文章信息/Info

Title:
An Improved Random Forest Algorithm Based on Feature Reduction
文章编号:
1673-629X(2020)03-0040-06
作者:
王 诚高 蕊
南京邮电大学 通信与信息工程学院,江苏 南京 210003
Author(s):
WANG ChengGAO Rui
School of Telecommunications & Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
关键词:
随机森林权重排序特征约简抽样方式RW_RF算法
Keywords:
random forestweight sortingfeature reductionsampling methodRW_RF algorithm
分类号:
TP311
DOI:
10. 3969 / j. issn. 1673-629X. 2020. 03. 008
摘要:
随机森林(random forest,RF)算法虽应用广泛且分类准确度很高,但在面对特征维度高且不平衡的数据时,算法分 类性能被严重削弱。 高维数据通常包含大量的无关和冗余的特征,针对这个问题,结合权重排序和递归特征筛选的思想,提出了一种改进的随机森林算法RW_RF (ReliefF & wrapper random forest)。 首先引用ReliefF算法对数据集的所有特征按 正负类分类能力赋予不同的权值,再递归地删除冗余的低权值特征,得到分类性能最佳的特征子集来构造随机森林;同时 改进ReliefF的抽样方式,以减轻不平衡数据对分类模型的影响。 实验结果显示,在特征数目很多的数据集中,改进算法的 各评价指标均高于原算法,证明提出的RW_RF算法有效精简了特征子集,减轻了冗余特征对模型分类精度的影响,同时 也证明了改进算法对处理不平衡数据起到了一定的效果。
Abstract:
Although the Random Forest (RF) algorithm is widely used and highly accurate in the classification,its performance is severely weakened when? faced with high and unbalanced features. High-dimensional data usually contains a large number of irrelevant and redundant features,so we propose an improved random forest algorithm RW_RF (ReliefF & wrapper random forest) based on the idea of weight sorting and recursive feature screening. Firstly different weights are assigned by ReliefF algorithm to all features according to the positive and negative classification ability,and then the redundant low-weight features are deleted recursively to obtain the feature subset with the best classification performance for the random forest construction. At the same time,the ReliefF sampling method is improved to mitigate the impact of unbalanced data on the classification model. The experiment shows that the evaluation indexes are improved as a whole,which proves that the proposed RW_RF algorithm effectively reduces the feature subset and the influence of redundant features on the classification accuracy of the model. It also proves that the improved algorithm is effectiveon processing unbalanced data.

相似文献/References:

[1]陈斌,苏一丹,黄山. 基于KM-SMOTE和随机森林的不平衡数据分类[J].计算机技术与发展,2015,25(09):17.
 CHEN Bin,SU Yi-dan,HUANG Shan. Classification of Imbalance Data Based on KM-SMOTE Algorithm and Random Forest[J].Computer Technology and Development,2015,25(03):17.
[2]张丹丹,李雷. 基于PCANet-RF的人脸检测系统[J].计算机技术与发展,2016,26(02):31.
 ZHANG Dan-dan,LI Lei. Face Detection System Based on PCANet-RF[J].Computer Technology and Development,2016,26(03):31.
[3]刘广东,邱晓晖.基于多模式LBP 与深度森林的指静脉识别[J].计算机技术与发展,2018,28(07):83.[doi:10.3969/ j. issn.1673-629X.2018.07.018]
 LIU Guang-dong,QIU Xiao-hui.Finger Vein Recognition Based on Multi-mode LBP and Deep Forest[J].Computer Technology and Development,2018,28(03):83.[doi:10.3969/ j. issn.1673-629X.2018.07.018]
[4]张鑫,吴海涛,曹雪虹.Hadoop 环境下基于随机森林的特征选择算法[J].计算机技术与发展,2018,28(07):88.[doi:10.3969/ j. issn.1673-629X.2018.07.019]
 ZHANG Xin,WU Hai-tao,CAO Xue-hong.A Feature Selection Algorithm Based on Random Forest in Hadoop Platform[J].Computer Technology and Development,2018,28(03):88.[doi:10.3969/ j. issn.1673-629X.2018.07.019]
[5]刘凯,郑山红,蒋权,等.基于随机森林的自适应特征选择算法[J].计算机技术与发展,2018,28(09):101.[doi:10.3969/j.issn.1673-629X.2018.09.021]
 LIU Kai,ZHENG Shanhong,JIANG Quan,et al.A Self-adaptive Feature Selection Algorithm Based on Random Forest[J].Computer Technology and Development,2018,28(03):101.[doi:10.3969/j.issn.1673-629X.2018.09.021]
[6]陆兵,顾苏杭.基于级联特征的随机森林运动目标跟踪算法[J].计算机技术与发展,2019,29(05):86.[doi:10. 3969 / j. issn. 1673-629X. 2019. 05. 019]
 LU Bing,GU Su-hang.A Moving Object Tracking Algorithm of Random Forest Based on Features Cascade[J].Computer Technology and Development,2019,29(03):86.[doi:10. 3969 / j. issn. 1673-629X. 2019. 05. 019]
[7]刘耀杰,刘独玉.基于不平衡数据集的改进随机森林算法研究[J].计算机技术与发展,2019,29(06):100.[doi:10. 3969 / j. issn. 1673-629X. 2019. 06. 021]
 LIU Yao-jie,LIU Du-yu.Research on Improved Random Forest Algorithm Based on Unbalanced Datasets[J].Computer Technology and Development,2019,29(03):100.[doi:10. 3969 / j. issn. 1673-629X. 2019. 06. 021]
[8]苗立志,刁继尧,娄 冲,等.基于 Spark 和随机森林的乳腺癌风险预测分析[J].计算机技术与发展,2019,29(08):142.[doi:10. 3969 / j. issn. 1673-629X. 2019. 08. 027]
 MIAO Li-zhi,DIAO Ji-yao,LOU Chong,et al.Breast Cancer Risk Prediction Analysis Based on Apache Spark and Random Forest Algorithm[J].Computer Technology and Development,2019,29(03):142.[doi:10. 3969 / j. issn. 1673-629X. 2019. 08. 027]
[9]于 澍,曹 琦,刘 涛.基于随机森林的微博互动特征分析[J].计算机技术与发展,2019,29(10):51.[doi:10. 3969 / j. issn. 1673-629X. 2019. 10. 011]
 YU Shu,CAO Qi,LIU Tao.Analysis of Interactive Characteristics of Weibo Based on Random Forest[J].Computer Technology and Development,2019,29(03):51.[doi:10. 3969 / j. issn. 1673-629X. 2019. 10. 011]
[10]姬晓飞,石宇辰.多分类器融合的光学遥感图像目标识别算法[J].计算机技术与发展,2019,29(11):52.[doi:10. 3969 / j. issn. 1673-629X. 2019. 11. 011]
 JI Xiao-fei,SHI Yu-chen.Optical Remote Sensing Image Object Recognition Based on Multiple Classifications Fusion[J].Computer Technology and Development,2019,29(03):52.[doi:10. 3969 / j. issn. 1673-629X. 2019. 11. 011]

更新日期/Last Update: 2020-03-10