[1]高亚星,赵旭俊,曹栩阳.基于融合数据自表示的离群点检测算法[J].计算机技术与发展,2023,33(12):41-48.[doi:10. 3969 / j. issn. 1673-629X. 2023. 12. 006]
 GAO Ya-xing,ZHAO Xu-jun,CAO Xu-yang.An Outlier Detection Algorithm Based on Fusion Data Self-representation[J].,2023,33(12):41-48.[doi:10. 3969 / j. issn. 1673-629X. 2023. 12. 006]
点击复制

基于融合数据自表示的离群点检测算法()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
33
期数:
2023年12期
页码:
41-48
栏目:
大数据与云计算
出版日期:
2023-12-10

文章信息/Info

Title:
An Outlier Detection Algorithm Based on Fusion Data Self-representation
文章编号:
1673-629X(2023)12-0041-08
作者:
高亚星赵旭俊曹栩阳
太原科技大学 计算机科学与技术学院,山西 太原 030024
Author(s):
GAO Ya-xingZHAO Xu-junCAO Xu-yang
School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024,China
关键词:
离群点检测数据自表示特征分组信息熵随机游走
Keywords:
outlier detectiondata self-representationfeature groupinginformation entropyrandom walk
分类号:
TP311. 13
DOI:
10. 3969 / j. issn. 1673-629X. 2023. 12. 006
摘要:
数据自表示方法可以用于离群点检测,起到了放大数据间差异性和关联性的作用,但现有技术未能体现特征之间关联性对离群点检测的影响,因此无法用于高维数据。 针对这个问题,提出了一种基于融合数据自表示的离群点检测算法,它可以有效地检测出高维数据中的离群点。 首先,提出了一种基于特征关系的数据自表示方法,结合互信息与信息熵理论,度量高维数据特征间的关联性,并将其融于数据间的稀疏表示过程,体现了特征间和数据间的复杂关系。 其次,提出了一种基于融合组间数据自表示的计算方法,采用点乘的方式将不同特征分组对应的自表示矩阵融于一体,形成全局数据自表示矩阵。 最后,提出基于融合数据自表示的离群点检测算法,在全局数据自表示矩阵形成的有向加权图上,通过图随机游走检测离群点。 实验结果表明,该算法在真实数据集和人工合成数据集上的检测性能均高于对比算法,证明该算法具有良好的泛化性和稳定性。
Abstract:
Data self - representation method can be used for outlier detection, which plays a role in magnifying the difference andcorrelation among data. However,the existing technologies fail?
to reflect the influence of correlation among features on outlier detection,so it cannot be used for high - dimensional data. To solve this problem, an outlier detection algorithm based?
on fusion data self -representation is proposed,which can effectively detect outliers in high-dimensional data. Firstly,a data self-representation method basedon feature correlation is proposed,which combines mutual information and information entropy theory to measure the correlation amongfeatures of high-dimensional data,and integrates it into the sparse representation process among data,reflecting the complex relationshipamong features and data. Secondly,a calculation method based on the data self-representation among fusion groups is proposed. The self-representation matrix corresponding to different feature groups is integrated by point multiplication to form a global data self -representation matrix. Finally, an outlier detection algorithm based on fusion data self - representation is proposed. On the directedweighted graph formed by the global data self - representation matrix, outliers are detected by graph random walk. The experimentalresults show that the detection performance of the proposed algorithm on real datasets and synthetic datasets is higher than that of?
the comparison algorithm,which proves that the proposed algorithm has good generalization and stability.

相似文献/References:

[1]施冬冬 贾瑞玉 黄义堂.基于遗传算法的高维离群点检测算法的改进[J].计算机技术与发展,2009,(03):141.
 SHI Dong-dong,JIA Rui-yu,HUANG Yi-tang.An Improved High-Dimensional Outlier Detection Algorithm Based on Genetic Algorithm[J].,2009,(12):141.
[2]包小兵 翟素兰 程兰兰.基于信息熵加权的局部离群点检测算法[J].计算机技术与发展,2012,(09):59.
 BAO Xiao-bing,ZHAI Su-lan,CHENG Lan-lan.SLOM Outlier Mining Algorithm Based on Entropy Weighted[J].,2012,(12):59.
[3]李 寒,余 斌,佟 宁,等.一种电力感知数据的离群点检测方案[J].计算机技术与发展,2020,30(02):153.[doi:10. 3969 / j. issn. 1673-629X. 2020. 02. 030]
 LI Han,YU Bin,TONG Ning,et al.An Electric Power Sensor Data Oriented Outlier Detection Solution[J].,2020,30(12):153.[doi:10. 3969 / j. issn. 1673-629X. 2020. 02. 030]
[4]刘俊成,董 东.基于相对比重的扩展隔离森林算法[J].计算机技术与发展,2023,33(06):16.[doi:10. 3969 / j. issn. 1673-629X. 2023. 06. 003]
 LIU Jun-cheng,DONG Dong.Extended Isolation Forest Algorithm Based on Relative Proportion[J].,2023,33(12):16.[doi:10. 3969 / j. issn. 1673-629X. 2023. 06. 003]
[5]曹霞,郑爱宇,郝静.基于自适应距离的离群点检测算法[J].计算机技术与发展,2024,34(09):138.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0137]
 CAO Xia,ZHENG Ai-yu,HAO Jing.Adaptive Distance Based Outlier Detection Algorithm[J].,2024,34(12):138.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0137]

更新日期/Last Update: 2023-12-10