[1]戴贵洋,綦秀利,余晓晗.融合人类知识的随机森林特征选择方法研究[J].计算机技术与发展,2022,32(07):155-160.[doi:10. 3969 / j. issn. 1673-629X. 2022. 07. 027]
 DAI Gui-yang,QI Xiu-li*,YU Xiao-han.Research on Random Forest Feature Selection Method by Human Knowledge[J].,2022,32(07):155-160.[doi:10. 3969 / j. issn. 1673-629X. 2022. 07. 027]
点击复制

融合人类知识的随机森林特征选择方法研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
32
期数:
2022年07期
页码:
155-160
栏目:
应用前沿与综合
出版日期:
2022-07-10

文章信息/Info

Title:
Research on Random Forest Feature Selection Method by Human Knowledge
文章编号:
1673-629X(2022)07-0155-06
作者:
戴贵洋綦秀利余晓晗
陆军工程大学 指挥控制工程学院,江苏 南京 210007
Author(s):
DAI Gui-yangQI Xiu-li* YU Xiao-han
School Command & Control Engineering,Army Engineering University of PLA,Nanjing 210007,China
关键词:
特征选择随机森林人类知识模糊系统数据降维
Keywords:
feature selectionrandom foresthuman knowledgefuzzy systemdata dimensionality reduction
分类号:
TP182;TP391
DOI:
10. 3969 / j. issn. 1673-629X. 2022. 07. 027
摘要:
特征选择可以从原始特征空间中选择出一些最有效的特征以降低数据特征维度,提高学习算法性能。 在数据降维问题中,常见的特征选择方法主要依靠数据本身的统计特性,通过数据本身信息选择更有效的特征,然而一些实际问题中往往积累了大量人类经验,这些人类知识可能对特征选择有重要影响,但很少有特征选择方法考虑使用这些人类知识。针对此类包含人类知识问题,并兼顾人类知识和采集数据的特征选择方法,提出了基于随机森林和模糊系统的二次筛选的特征选择模型。 该模型通过随机森林算法剔除原始数据集中的冗余特征,实现初步筛选,利用初选特征中包含的人类知识搭建模糊系统,对初选特征计算评估得分,筛选出最终的关键特征。 在汽油提纯真实数据集上进行了实验,相较于常规特征选择方法,该模型有显著提升,验证了结合人类知识随机森林特征选择方法的有效性。
Abstract:
Feature selection methods can select more efficient features from the original feature space to reduce data characteristic dimensions and improve learning algorithm performance. For the problem of data dimensionality reduction, common feature selection methods mainly rely on the statistical characteristics of the data itself,and select more effective features through the data itself. However,a lot of human experience is often accumulated in some practical problems. Human knowledge may have an important influence on feature selection,but few feature selection methods take the use of such human knowledge into account. In response to this kind of feature selection method that contains human knowledge and takes into account both human knowledge and collected data,a feature selection model based on secondary screening of random forest and fuzzy system is proposed. The model uses the random forest algorithm to eliminate redundant features in the original data set to achieve preliminary screening, build a fuzzy system using human knowledge contained in primary elections, calculate evaluation scores for the primary selected features, and screen out the final key features. Experiments were carried out on the real data set of gasoline purification. Compared with the conventional feature selection method,the model has a significant improvement,which verifies the effectiveness of the random forest feature selection method combined with human knowledge.

相似文献/References:

[1]刘利 何先平 袁文亮.股票趋势预测中Wrapper方法的研究与应用[J].计算机技术与发展,2010,(01):209.
 LIU Li,HE Xian-ping,YUAN Wen-liang.Research and Application of Wrapper Approach to Stock Trend Prediction[J].,2010,(07):209.
[2]黄炜 黄志华.一种基于遗传算法和SVM的特征选择[J].计算机技术与发展,2010,(06):21.
 HUANG Wei,HUANG Zhi-hua.Feature Selection Based on Genetic Algorithm and SVM[J].,2010,(07):21.
[3]张家柏 王小玲.基于聚类和二进制PSO的特征选择[J].计算机技术与发展,2010,(06):25.
 ZHANG Jia-bai,WANG Xiao-ling.A Novel Algorithm Based on K-Means Clustering and Binary Particle Swarm Optimization[J].,2010,(07):25.
[4]冯甲策 叶明 王惠文.基于Gram—Schmidt过程的支持向量机降维方法[J].计算机技术与发展,2009,(11):7.
 FENG Jia-ce,YE Ming,WANG Hui-wen.Dimension Reduction Method of Support Vector Machine Based on Gram- Schmidt Process[J].,2009,(07):7.
[5]林伟 柳荣其 徐熙.邮件过滤中一种改进的特征选择方法研究[J].计算机技术与发展,2009,(01):84.
 LIN Wei,LIU Rong-qi,XU Xi.Improvement of Feature Selection Algorithm in Spam Filtering[J].,2009,(07):84.
[6]刘毅 张月琳.基于Agent的邮件过滤与个性化分类系统设计[J].计算机技术与发展,2009,(02):66.
 LIU Yi,ZHANG Yue-lin.Design of a Mail Filter and Personalized Classification System Based on Agent[J].,2009,(07):66.
[7]陈素萍 谢丽聪.一种文本特征选择方法的研究[J].计算机技术与发展,2009,(02):112.
 CHEN Su-ping,XIE Li-cong.Research on Document Feature Selection[J].,2009,(07):112.
[8]段震 王倩倩 张燕平 张铃.覆盖算法下文本分类特征选择的研究[J].计算机技术与发展,2008,(11):29.
 DUAN Zhen,WANG Qian-qian,ZHANG Yan-ping,et al.Study on Feature Selection of Text Classification in Cross Cover Algorithm[J].,2008,(07):29.
[9]王希雷.基于Rough集理论的车牌汉字特征提取[J].计算机技术与发展,2007,(06):26.
 WANG Xi-lei.Car Plate Chinese Character Feature Extraction Based on Rough Set Theory[J].,2007,(07):26.
[10]董梅 胡学钢.基于多特征选择的中文文本分类[J].计算机技术与发展,2007,(07):117.
 DONG Mei,HU Xue-gang.Text Categorization Based on Multiple Features Selection[J].,2007,(07):117.
[11]张鑫,吴海涛,曹雪虹.Hadoop 环境下基于随机森林的特征选择算法[J].计算机技术与发展,2018,28(07):88.[doi:10.3969/ j. issn.1673-629X.2018.07.019]
 ZHANG Xin,WU Hai-tao,CAO Xue-hong.A Feature Selection Algorithm Based on Random Forest in Hadoop Platform[J].,2018,28(07):88.[doi:10.3969/ j. issn.1673-629X.2018.07.019]
[12]刘凯,郑山红,蒋权,等.基于随机森林的自适应特征选择算法[J].计算机技术与发展,2018,28(09):101.[doi:10.3969/j.issn.1673-629X.2018.09.021]
 LIU Kai,ZHENG Shanhong,JIANG Quan,et al.A Self-adaptive Feature Selection Algorithm Based on Random Forest[J].,2018,28(07):101.[doi:10.3969/j.issn.1673-629X.2018.09.021]
[13]陈 鹏,李勇志,余肖生.基于特征选择模型的钓鱼网站快速识别方法[J].计算机技术与发展,2021,31(04):40.[doi:10. 3969 / j. issn. 1673-629X. 2021. 04. 007]
 CHEN Peng,LI Yong-zhi,YU Xiao-sheng.Method for Quickly Identifying Phishing Websites Based onFeature Selection Model[J].,2021,31(07):40.[doi:10. 3969 / j. issn. 1673-629X. 2021. 04. 007]

更新日期/Last Update: 2022-07-10