[1]曹霞,郑爱宇,郝静.基于自适应距离的离群点检测算法[J].计算机技术与发展,2024,34(09):138-146.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0137]
 CAO Xia,ZHENG Ai-yu,HAO Jing.Adaptive Distance Based Outlier Detection Algorithm[J].,2024,34(09):138-146.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0137]
点击复制

基于自适应距离的离群点检测算法()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
34
期数:
2024年09期
页码:
138-146
栏目:
人工智能
出版日期:
2024-09-10

文章信息/Info

Title:
Adaptive Distance Based Outlier Detection Algorithm
文章编号:
1673-629X(2024)09-0138-09
作者:
曹霞郑爱宇郝静
太原科技大学 计算机科学与技术学院,山西 太原 030024
Author(s):
CAO XiaZHENG Ai-yuHAO Jing
School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024,China
关键词:
数据挖掘离群点检测属性贡献因子密度分布自适应距离
Keywords:
data miningoutlier detectionattribute contribution factordensity distributionadaptive distance
分类号:
TP311.1
DOI:
10.20165/j.cnki.ISSN1673-629X.2024.0137
摘要:
基于近邻的离群点检测方法根据数据对象周围的邻居来挖掘离群点,但该类方法受阈值参数的影响较大,且大多只在数据分布单一的情况下表现良好。 针对数据分布多样的情况下离群点检测困难以及阈值参数的敏感性问题,提出了一种基于自适应距离的离群点检测算法。 首先,通过动态地调整数据属性的贡献因子,使得关键属性在离群点检测中具有更大的影响力,能够准确反映关键属性与离群点之间的关联性;其次,综合考虑属性贡献因子和密度来计算数据对象之间的距离,以便更好地识别数据对象之间的位置关系和密度分布特征;最后,为了降低阈值参数的影响,逐步增大邻居的大小来计算数据对象的自适应距离的变化之和,将其累加作为离群得分。 通过在人工合成数据集和公共数据集上进行实验,验证了提出的算法检测精度更高。
Abstract:
Near-neighbour based outlier detection methods mine outlier points based on the neighbours around the data object,but this type of method is greatly affected by the threshold parameter and mostly performs well only in the case of a single data distribution.Aiming at the difficulty of outlier detection in the case of diverse data distribution and the sensitivity of threshold parameters,an adaptive distance-based outlier detection algorithm is proposed. Firstly,by dynamically adjusting the contribution factor of data attributes,the key attributes have more influence in outlier detection,which can accurately reflect the correlation between the key attributes and outliers. Secondly,the distance between data objects is calculated by comprehensively considering the contribution factor of attributes and the density,so as to better identify the positional relationship between data objects and the density distribution characteristics. Lastly,in order to reduce the threshold parameter’s influence,the size of neighbours is gradually increased to calculate the sum of changes in adaptive distances of data objects,which is accumulated as the outlier score. The proposed algorithm is verified to have higher detection accuracy through experiments on synthetic datasets and public datasets.

相似文献/References:

[1]项响琴 汪彩梅.基于聚类高维空间算法的离群数据挖掘技术研究[J].计算机技术与发展,2010,(01):120.
 XIANG Xiang-qin,WANG Cai-mei.Study of Outlier Data Mining Based on CLIQUE Algorithm[J].,2010,(09):120.
[2]李雷 丁亚丽 罗红旗.基于规则约束制导的入侵检测研究[J].计算机技术与发展,2010,(03):143.
 LI Lei,DING Ya-li,LUO Hong-qi.Intrusion Detection Technology Research Based on Homing - Constraint Rule[J].,2010,(09):143.
[3]吉同路 柏永飞 王立松.住宅与房地产电子政务中数据挖掘的应用研究[J].计算机技术与发展,2010,(01):235.
 JI Tong-lu,BAI Yong-fei,WANG Li-song.Study and Application of Data Mining in E-government of House and Real Estate Industry[J].,2010,(09):235.
[4]杨静 张楠男 李建 刘延明 梁美红.决策树算法的研究与应用[J].计算机技术与发展,2010,(02):114.
 YANG Jing,ZHANG Nan-nan,LI Jian,et al.Research and Application of Decision Tree Algorithm[J].,2010,(09):114.
[5]赵裕啸 倪志伟 王园园 伍章俊.SQL Server 2005数据挖掘技术在证券客户忠诚度的应用[J].计算机技术与发展,2010,(02):229.
 ZHAO Yu-xiao,NI Zhi-wei,WANG Yuan-yuan,et al.Application of Data Mining Technology of SQL Server 2005 in Customer Loyalty Model in Securities Industry[J].,2010,(09):229.
[6]张笑达 徐立臻.一种改进的基于矩阵的频繁项集挖掘算法[J].计算机技术与发展,2010,(04):93.
 ZHANG Xiao-da,XU Li-zhen.An Advanced Frequent Itemsets Mining Algorithm Based on Matrix[J].,2010,(09):93.
[7]王爱平 王占凤 陶嗣干 燕飞飞.数据挖掘中常用关联规则挖掘算法[J].计算机技术与发展,2010,(04):105.
 WANG Ai-ping,WANG Zhan-feng,TAO Si-gan,et al.Common Algorithms of Association Rules Mining in Data Mining[J].,2010,(09):105.
[8]张广路 雷景生 吴兴惠.一种改进的Apriori关联规则挖掘算法(英文)[J].计算机技术与发展,2010,(06):84.
 ZHANG Guang-lu,LEI Jing-sheng,WU Xing-hui.An Improved Apriori Algorithm for Mining Association Rules[J].,2010,(09):84.
[9]吴楠 胡学钢.基于聚类分区的序列模式挖掘算法研究[J].计算机技术与发展,2010,(06):109.
 WU Nan,HU Xue-gang.Research on Clustering Partition-Based Approach of Sequential Pattern Mining[J].,2010,(09):109.
[10]吴青 傅秀芬.水平分布数据库的正负关联规则挖掘[J].计算机技术与发展,2010,(06):113.
 WU Qing,FU Xiu-fen.Positive and Negative Association Rules Mining on Horizontally Partitioned Database[J].,2010,(09):113.

更新日期/Last Update: 2024-09-10