[1]李春生,焦海涛,刘 澎,等.基于 C4. 5 决策树分类算法的改进与应用[J].计算机技术与发展,2020,30(05):185-189.[doi:10. 3969 / j. issn. 1673-629X. 2020. 05. 035]
 LI Chun-sheng,JIAO Hai-tao,LIU Peng,et al.Improvement and Application of C4.5 Decision Tree Classification Algorithm[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2020,30(05):185-189.[doi:10. 3969 / j. issn. 1673-629X. 2020. 05. 035]
点击复制

基于 C4. 5 决策树分类算法的改进与应用()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
30
期数:
2020年05期
页码:
185-189
栏目:
应用开发研究
出版日期:
2020-05-10

文章信息/Info

Title:
Improvement and Application of C4.5 Decision Tree Classification Algorithm
文章编号:
1673-629X(2020)05-0185-05
作者:
李春生焦海涛刘 澎刘小刚
东北石油大学 计算机与信息技术学院,黑龙江 大庆 163318
Author(s):
LI Chun-shengJIAO Hai-taoLIU PengLIU Xiao-gang
School of Computer and Information Technology,Northeast Petroleum University,Daqing 163318,China
关键词:
决策树数据概率信息增益率时间效率改进算法
Keywords:
decision treedata probabilityinformation gain ratetime efficiencyimproved algorithm
分类号:
TP301. 6
DOI:
10. 3969 / j. issn. 1673-629X. 2020. 05. 035
摘要:
决策树算法是在已知具有不同特征的样本数据出现的概率基础上,构建决策树来进行数据分析的一种算法。 在数据分类算法中,决策树算法是一种经典的分类决策算法。 首先,将所有的数据特征看作是各个树的节点,遍历所有特征,其中每当遍历到其中某个特征时,对特征进行分割处理,并记录分割点的数据信息,作为划分子节点的纯度依据。 其次,比较记录的数据特征以及判定最优特征,寻找最优划分方式,对样本数据集进行分割操作。 最后,构建符合规则的决策树。 针对传统的决策树 C4.5 算法计算信息增益率时间过长的问题,提出了一种改进的 K-C4.5 算法,引用麦克劳林公式和泰勒公式的思想,将信息增益率计算公式从对数函数转化为非对数函数,从而降低运算的时间效率。 以实际数据集进行测试,验证了改进后的算法具有一定的效果。
Abstract:
The decision tree algorithm is an algorithm to construct a decision tree for data analysis based on the probability of occurrence of sample data with different characteristics. In the data classification algorithm, the decision tree algorithm is a classic classification decision algorithm. First,all data features are treated as nodes of each tree,and all features are traversed. Whenever one of the features is traversed,the feature is segmented and the data of the segmentation point is recorded as the sub-node purity basis. Secondly,the recorded data features is compared and the optimal features is determined,and the optimal partitioning method is found to perform the segmentation operation on the sample dataset. Finally,a decision tree that conforms to the rules is built. In this paper,the problem of calculating the information gain rate is too long for the traditional decision tree C4.5 algorithm. An improved K-C4.5 algorithm is proposed,which uses the ideas of the McLaughlin formula and the Taylor formula to calculate the information gain rate. From the logarithmic function to the non-logarithmic function, the time efficiency of the operation is reduced. The actual data set is tested to verify that the improved algorithm has certain effects.

相似文献/References:

[1]杨静 张楠男 李建 刘延明 梁美红.决策树算法的研究与应用[J].计算机技术与发展,2010,(02):114.
 YANG Jing,ZHANG Nan-nan,LI Jian,et al.Research and Application of Decision Tree Algorithm[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2010,(05):114.
[2]耿波 仲红 徐杰 闫娜娜.用关联分析法对负荷预测结果进行二次处理[J].计算机技术与发展,2008,(04):171.
 GENG Bo,ZHONG Hong,XU Jie,et al.Using Correlation Analysis to Treat Load Forecasting Results[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2008,(05):171.
[3]胡琼凯 黄建华.基于协议分析和决策树的入侵检测研究[J].计算机技术与发展,2009,(06):179.
 HU Oiong-kai,HUANG Jian-hua.Intrusion Detection Based on Protocol Analysis and Decision Tree[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2009,(05):179.
[4]王园园 倪志伟 赵裕啸 伍章俊.基于决策树的模糊聚类评价算法及其应用[J].计算机技术与发展,2009,(09):232.
 WANG Yuan-yuan,NI Zhi-wei,ZHAO Yu-xiao,et al.Fuzzy Clustering Evaluation Algorithm Based on Decision Tree and Application[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2009,(05):232.
[5]李广水 郑滔 孙梅.基于分形维的决策树构建及应用研究[J].计算机技术与发展,2009,(12):5.
 LI Guang-shui,ZHENG Tao,SUN Mei.Research of Decision Tree Design and Application Based on Fractal Dimension[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2009,(05):5.
[6]石瑛 胡学钢 方磊.基于决策树的多特征语音情感识别[J].计算机技术与发展,2009,(01):147.
 SHI Ying,HU Xue-gang,FANG Lei.Research of Speech Emotion Recognition Based on Decision Tree and Acoustic Features[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2009,(05):147.
[7]李霞.ID3分类算法在银行客户流失中的应用研究[J].计算机技术与发展,2009,(03):158.
 LI Xia.ID3 Applying to Loss of Bank Clients[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2009,(05):158.
[8]马菁 顾景文.决策树在软件测试用例生成中的应用[J].计算机技术与发展,2008,(02):66.
 MA Jing,GU Jing-wen.Application of Decision Tree on Software Test Case Generation[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2008,(05):66.
[9]汪小燕 杨思春.一种基于分辨矩阵的新的属性约简算法[J].计算机技术与发展,2008,(02):77.
 WANG Xiao-yan,YANG Si-chun.A New Algorithm for AttributeReduction Based on Discernible Matrix[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2008,(05):77.
[10]刘星毅.一种新的决策树分裂属性选择方法[J].计算机技术与发展,2008,(05):70.
 LIU Xing-yi.A New Splitting Criterion of Decision Trees[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2008,(05):70.

更新日期/Last Update: 2020-05-10