[1]刘俊成,董 东.基于相对比重的扩展隔离森林算法[J].计算机技术与发展,2023,33(06):16-21.[doi:10. 3969 / j. issn. 1673-629X. 2023. 06. 003]
 LIU Jun-cheng,DONG Dong.Extended Isolation Forest Algorithm Based on Relative Proportion[J].,2023,33(06):16-21.[doi:10. 3969 / j. issn. 1673-629X. 2023. 06. 003]
点击复制

基于相对比重的扩展隔离森林算法()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
33
期数:
2023年06期
页码:
16-21
栏目:
大数据与云计算
出版日期:
2023-06-10

文章信息/Info

Title:
Extended Isolation Forest Algorithm Based on Relative Proportion
文章编号:
1673-629X(2023)06-0016-06
作者:
刘俊成董 东
河北师范大学 计算机与网络空间安全学院,河北 石家庄 050024
Author(s):
LIU Jun-chengDONG Dong
School of Computer and Cyber Security,Hebei Normal University,Shijiazhuang 050024,China
关键词:
大数据挖掘离群点检测局部离群点扩展的隔离森林算法相对比重
Keywords:
big data miningoutlier detectionlocal outliersextended isolated forest algorithmrelative proportion
分类号:
TP391
DOI:
10. 3969 / j. issn. 1673-629X. 2023. 06. 003
摘要:
由于局部离群点被密度相似的正常点掩盖,不易被隔离,使得扩展的隔离森林算法( EIF) 对这类离群点的识别效果不理想。 针对此问题,提出基于相对比重的扩展隔离森林算法( Relative Proportion-Extended Isolation Forest,RP-EIF) 。该算法仍然基于随机斜度和随机截距划分超平面,生成隔离森林,但根据预测样本落入的叶子节点与其父节点的相对比重计算离群分数排名,而不使用基于路径长度的排名。 把全局排名替换为考虑邻域数据分布局部排名增强了算法对局部离群点的敏感性,同时还减少了算法的时间复杂度。 在离群点检测数据库( ODDS) 的 5 个公开数据集上验证 RP-EIF 算法的有效性和算法效率,并与 EIF 算法、GIF 算法、iForest 算法、COPOD 算法、LOF 算法进行了对比。 实验表明:RP-EIF 算法在 5 个 ODDS 公开数据集上的准确率高出 EIF 算法 1 至 4 百分点,高出其他 5 个算法 2 至 38 百分点。 而且在 5 个数据集上的时间消耗 RP-EIF 算法要比 EIF 算法减少约 30% 。
Abstract:
Since local outliers are covered by normal points with similar density,they are not easy to be isolated,so the extended isolationalgorithm ( EIF) is not effective in identifying such outliers.?
To solve this problem,an Relative Proportion-Extended Isolation Forest algorithm is proposed. The algorithm still divides the hyperplane based on random slopes and random intercepts to generate isolationforests,but ranks the outlier score based on the relative proportions of the leaf nodes that the predicted samples are fallen into with theirparent nodes,rather than the path length-based ranking. Replacing the global ranking with local ranking considering the neighborhooddata distribution enhances the algorithm' s sensitivity to local outliers and reduces the algorithm' s time complexity. The effectiveness andalgorithm efficiency of the RP-EIF algorithm are tried on 5 public datasets in the Outlier Detection Databases ( ODDS) . Compared withEIF algorithm,GIF algorithm, iForest algorithm, COPOD algorithm, LOF algorithm, the accuracy of the RP - EIF algorithm on the 5ODDS public datasets is 1 to 4 percentage points higher than the EIF algorithm,and 2 to 38 percentage points higher than the other 5 algorithms. Moreover,the time consumption of the RP-EIF algorithm on the 5 datasets is about 30% less than that of the EIF algorithm.

相似文献/References:

[1]施冬冬 贾瑞玉 黄义堂.基于遗传算法的高维离群点检测算法的改进[J].计算机技术与发展,2009,(03):141.
 SHI Dong-dong,JIA Rui-yu,HUANG Yi-tang.An Improved High-Dimensional Outlier Detection Algorithm Based on Genetic Algorithm[J].,2009,(06):141.
[2]包小兵 翟素兰 程兰兰.基于信息熵加权的局部离群点检测算法[J].计算机技术与发展,2012,(09):59.
 BAO Xiao-bing,ZHAI Su-lan,CHENG Lan-lan.SLOM Outlier Mining Algorithm Based on Entropy Weighted[J].,2012,(06):59.
[3]杨明,李铁冰,姜茸,等.基于AHP 的大数据可用性及挖掘方案模型研究[J].计算机技术与发展,2018,28(05):51.[doi:10.3969/j.issn.1673-629X.2018.05.012]
 YANG Ming,LI Tie-bing,JIANG Rong,et al.Research on Model of Big Data Usability and Mining Strategy Based on AHP[J].,2018,28(06):51.[doi:10.3969/j.issn.1673-629X.2018.05.012]
[4]李 寒,余 斌,佟 宁,等.一种电力感知数据的离群点检测方案[J].计算机技术与发展,2020,30(02):153.[doi:10. 3969 / j. issn. 1673-629X. 2020. 02. 030]
 LI Han,YU Bin,TONG Ning,et al.An Electric Power Sensor Data Oriented Outlier Detection Solution[J].,2020,30(06):153.[doi:10. 3969 / j. issn. 1673-629X. 2020. 02. 030]
[5]高亚星,赵旭俊,曹栩阳.基于融合数据自表示的离群点检测算法[J].计算机技术与发展,2023,33(12):41.[doi:10. 3969 / j. issn. 1673-629X. 2023. 12. 006]
 GAO Ya-xing,ZHAO Xu-jun,CAO Xu-yang.An Outlier Detection Algorithm Based on Fusion Data Self-representation[J].,2023,33(06):41.[doi:10. 3969 / j. issn. 1673-629X. 2023. 12. 006]

更新日期/Last Update: 2023-06-10