[1]王 诚,狄 萱.孤立森林算法研究及并行化实现[J].计算机技术与发展,2021,31(06):13-18.[doi:10. 3969 / j. issn. 1673-629X. 2021. 06. 003]
 WANG Cheng,DI Xuan.Research and Parallelization of Isolation Forest Algorithm[J].,2021,31(06):13-18.[doi:10. 3969 / j. issn. 1673-629X. 2021. 06. 003]
点击复制

孤立森林算法研究及并行化实现()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
31
期数:
2021年06期
页码:
13-18
栏目:
大数据分析与挖掘
出版日期:
2021-06-10

文章信息/Info

Title:
Research and Parallelization of Isolation Forest Algorithm
文章编号:
1673-629X(2021)06-0013-06
作者:
王 诚狄 萱
南京邮电大学 通信与信息工程学院,江苏 南京 210003
Author(s):
WANG ChengDI Xuan
School of Telecommunications & Information Engineering,Nanjing University of Posts and Telecommunications, Nanjing 210003,China
关键词:
异常检测孤立森林算法孤立二叉树Spark并行化
Keywords:
anomaly detectionIsolation Forest algorithmisolation treeSparkparallelization
分类号:
TP301. 6
DOI:
10. 3969 / j. issn. 1673-629X. 2021. 06. 003
摘要:
异常检测是近年来数据挖掘中热门的研究课题之一,孤立森林算法是一种高效的无监督的异常检测算法,可以很好地处理高维大规模数据。 针对孤立森林算法在计算测试样本的异常值时,计算的是测试样本在孤立森林下的平均路径长度, 忽略了孤立二叉树间检测异常能力的差异性以及大规模数据下构建大量孤立二叉树需要耗费大量内存时间这两点不足,提出一种并行化改进孤立森林算法。 利用每棵孤立二叉树的路径长度标准差对其进行加权计算异常值, 并基于Spark 平台实现并行化。 通过在公开数据集上进行的对比实验及多种参数配置的并行性能对比实验表明,并行化改进孤立森林算法能够提高异常检测的精确度,同时具有很好的并行性能,能够高效处理需要构建大量孤立二叉树的大规模数据集。
Abstract:
Anomaly detection is one of the hot research topics in data mining in recent years. Isolation Forest algorithm? is an efficient unsupervised anomaly detection algorithm that can handle high-dimensional? large-scale data well. When Isolation Forest algorithm calculates the outliers of test samples,it calculates the average path length of test samples in Isolation Forest,ignoring the difference in the ability to detect abnormalities between isolation trees and the large amount of memory and time needed to construct a larger number of isolation trees under large-scale data. For these two deficiencies,an improved parallelized Isolation Forest algorithm is proposed. The standard deviation of the path length of each isolation tree is used to weight the outliers,and the parallelization is implemented based on the Spark platform. The comparison experiments on public datasets and parallel performance comparison experiments with multiple parameter configurations show that the proposed algorithm can improve the accuracy of anomaly detection with excellent parallel performance,and can effectively deal with large-scale data sets that need to build a large number of isolation trees.

相似文献/References:

[1]高峥 陈蜀宇 李国勇.混合入侵检测系统的研究[J].计算机技术与发展,2010,(06):148.
 GAO Zheng,CHEN Shu-yu,LI Guo-yong.Research of a Hybrid Intrusion Detection System[J].,2010,(06):148.
[2]李睿 肖维民.基于孤立点挖掘的异常检测研究[J].计算机技术与发展,2009,(06):168.
 LI Rui,XIAO Wei-min.Research on Anomaly Intrusion Detection Based on Outlier Mining[J].,2009,(06):168.
[3]汪慧敏.基于改进负选择算法的异常检测[J].计算机技术与发展,2009,(08):41.
 WANG Hui-min.Anomaly Detection Using Modified Negative Selection Algorithm[J].,2009,(06):41.
[4]赵辉 张鹏.网络异常的主动检测与特征分析[J].计算机技术与发展,2009,(08):159.
 ZHAO Hui,ZHANG Peng.Active Detection and Feature Analysis About Network Anomaly[J].,2009,(06):159.
[5]陈丹伟 黄秀丽 任勋益.基于人工神经网络入侵检测模型的探讨[J].计算机技术与发展,2009,(12):143.
 CHEN Dan-wei,HUANG Xiu-li,REN Xun-yi.An Approach to IDS Model Based on Artificial Neuron Network[J].,2009,(06):143.
[6]柏海滨 李俊.基于支持向量机的入侵检测系统的研究[J].计算机技术与发展,2008,(04):137.
 BAI Hai-bin,LI Jun.Research of Intrusion Detection System Based on Support Vector Machine[J].,2008,(06):137.
[7]宋连涛 庄卫华.基于异常的入侵检测技术在Snort系统中的应用[J].计算机技术与发展,2006,(06):136.
 SONG Lian-tao,ZHUANG Wei-hua.Application of Anomaly Detection Technology in Snort System[J].,2006,(06):136.
[8]陈平 宋玉蓉 蒋国平.基于多维聚类挖掘的异常检测方法研究[J].计算机技术与发展,2012,(07):136.
 CHEN Ping,SONG Yu-rong,JIANG Guo-ping.Multidimensional Clustering Based Anomaly Detection Research[J].,2012,(06):136.
[9]崔锡鑫,苏伟,刘颖.基于熵的流量分析和异常检测技术研究与实现[J].计算机技术与发展,2013,(05):120.
 CUI Xi-xin,SU Wei,LIU Ying.Research and Implementation of Traffic Analysis and Anomaly Detection Technology Based on Entropy[J].,2013,(06):120.
[10]翁鹤,皮德常. 混沌RBF神经网络异常检测算法[J].计算机技术与发展,2014,24(07):29.
 WENG He,PI De-chang. Chaotic RBF Neural Network Anomaly Detection Algorithm[J].,2014,24(06):29.

更新日期/Last Update: 2021-06-10