«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j. issn. 1673-629X. 2022. 07. 027]
点击复制

融合人类知识的随机森林特征选择方法研究()

分享到：

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 32
期数:: 2022年07期

页码:: 155-160

栏目:: 应用前沿与综合

出版日期:: 2022-07-10

文章信息/Info

Title:: Research on Random Forest Feature Selection Method by Human Knowledge

文章编号:: 1673-629X(2022)07-0155-06

作者:: 戴贵洋; 綦秀利; 余晓晗; 陆军工程大学指挥控制工程学院,江苏南京 210007

Author(s):: DAI Gui-yang; QI Xiu-li* ; YU Xiao-han; School Command & Control Engineering,Army Engineering University of PLA,Nanjing 210007,China

关键词:: 特征选择; 随机森林; 人类知识; 模糊系统; 数据降维

Keywords:: feature selection; random forest; human knowledge; fuzzy system; data dimensionality reduction

分类号:: TP182;TP391

DOI:: 10. 3969 / j. issn. 1673-629X. 2022. 07. 027

摘要:: 特征选择可以从原始特征空间中选择出一些最有效的特征以降低数据特征维度,提高学习算法性能。在数据降维问题中,常见的特征选择方法主要依靠数据本身的统计特性,通过数据本身信息选择更有效的特征,然而一些实际问题中往往积累了大量人类经验,这些人类知识可能对特征选择有重要影响,但很少有特征选择方法考虑使用这些人类知识。针对此类包含人类知识问题,并兼顾人类知识和采集数据的特征选择方法,提出了基于随机森林和模糊系统的二次筛选的特征选择模型。该模型通过随机森林算法剔除原始数据集中的冗余特征,实现初步筛选,利用初选特征中包含的人类知识搭建模糊系统,对初选特征计算评估得分,筛选出最终的关键特征。在汽油提纯真实数据集上进行了实验,相较于常规特征选择方法,该模型有显著提升,验证了结合人类知识随机森林特征选择方法的有效性。

Abstract:: Feature selection methods can select more efficient features from the original feature space to reduce data characteristic dimensions and improve learning algorithm performance. For the problem of data dimensionality reduction, common feature selection methods mainly rely on the statistical characteristics of the data itself,and select more effective features through the data itself. However,a lot of human experience is often accumulated in some practical problems. Human knowledge may have an important influence on feature selection,but few feature selection methods take the use of such human knowledge into account. In response to this kind of feature selection method that contains human knowledge and takes into account both human knowledge and collected data,a feature selection model based on secondary screening of random forest and fuzzy system is proposed. The model uses the random forest algorithm to eliminate redundant features in the original data set to achieve preliminary screening, build a fuzzy system using human knowledge contained in primary elections, calculate evaluation scores for the primary selected features, and screen out the final key features. Experiments were carried out on the real data set of gasoline purification. Compared with the conventional feature selection method,the model has a significant improvement,which verifies the effectiveness of the random forest feature selection method combined with human knowledge.

相似文献/References:

[1]刘利何先平袁文亮.股票趋势预测中Wrapper方法的研究与应用[J].计算机技术与发展,2010,(01):209.
　LIU Li,HE Xian-ping,YUAN Wen-liang.Research and Application of Wrapper Approach to Stock Trend Prediction[J].,2010,(07):209.
[2]黄炜黄志华.一种基于遗传算法和SVM的特征选择[J].计算机技术与发展,2010,(06):21.
　HUANG Wei,HUANG Zhi-hua.Feature Selection Based on Genetic Algorithm and SVM[J].,2010,(07):21.
[3]张家柏王小玲.基于聚类和二进制PSO的特征选择[J].计算机技术与发展,2010,(06):25.
　ZHANG Jia-bai,WANG Xiao-ling.A Novel Algorithm Based on K-Means Clustering and Binary Particle Swarm Optimization[J].,2010,(07):25.
[4]冯甲策叶明王惠文.基于Gram—Schmidt过程的支持向量机降维方法[J].计算机技术与发展,2009,(11):7.
　FENG Jia-ce,YE Ming,WANG Hui-wen.Dimension Reduction Method of Support Vector Machine Based on Gram- Schmidt Process[J].,2009,(07):7.
[5]林伟柳荣其徐熙.邮件过滤中一种改进的特征选择方法研究[J].计算机技术与发展,2009,(01):84.
　LIN Wei,LIU Rong-qi,XU Xi.Improvement of Feature Selection Algorithm in Spam Filtering[J].,2009,(07):84.
[6]刘毅张月琳.基于Agent的邮件过滤与个性化分类系统设计[J].计算机技术与发展,2009,(02):66.
　LIU Yi,ZHANG Yue-lin.Design of a Mail Filter and Personalized Classification System Based on Agent[J].,2009,(07):66.
[7]陈素萍谢丽聪.一种文本特征选择方法的研究[J].计算机技术与发展,2009,(02):112.
　CHEN Su-ping,XIE Li-cong.Research on Document Feature Selection[J].,2009,(07):112.
[8]段震王倩倩张燕平张铃.覆盖算法下文本分类特征选择的研究[J].计算机技术与发展,2008,(11):29.
　DUAN Zhen,WANG Qian-qian,ZHANG Yan-ping,et al.Study on Feature Selection of Text Classification in Cross Cover Algorithm[J].,2008,(07):29.
[9]王希雷.基于Rough集理论的车牌汉字特征提取[J].计算机技术与发展,2007,(06):26.
　WANG Xi-lei.Car Plate Chinese Character Feature Extraction Based on Rough Set Theory[J].,2007,(07):26.
[10]董梅胡学钢.基于多特征选择的中文文本分类[J].计算机技术与发展,2007,(07):117.
　DONG Mei,HU Xue-gang.Text Categorization Based on Multiple Features Selection[J].,2007,(07):117.
[11]张鑫,吴海涛,曹雪虹.Hadoop 环境下基于随机森林的特征选择算法[J].计算机技术与发展,2018,28(07):88.[doi:10.3969/ j. issn.1673-629X.2018.07.019]
　ZHANG Xin,WU Hai-tao,CAO Xue-hong.A Feature Selection Algorithm Based on Random Forest in Hadoop Platform[J].,2018,28(07):88.[doi:10.3969/ j. issn.1673-629X.2018.07.019]
[12]刘凯,郑山红,蒋权,等.基于随机森林的自适应特征选择算法[J].计算机技术与发展,2018,28(09):101.[doi:10．3969/j．issn．1673－629X．2018．09．021]
　LIU Kai,ZHENG Shanhong,JIANG Quan,et al.A Self－adaptive Feature Selection Algorithm Based on Ｒandom Forest[J].,2018,28(07):101.[doi:10．3969/j．issn．1673－629X．2018．09．021]
[13]陈鹏,李勇志,余肖生.基于特征选择模型的钓鱼网站快速识别方法[J].计算机技术与发展,2021,31(04):40.[doi:10. 3969 / j. issn. 1673-629X. 2021. 04. 007]
　CHEN Peng,LI Yong-zhi,YU Xiao-sheng.Method for Quickly Identifying Phishing Websites Based onFeature Selection Model[J].,2021,31(07):40.[doi:10. 3969 / j. issn. 1673-629X. 2021. 04. 007]

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed878
全文下载/Downloads432
评论/Comments