[1]朱东生,吴庆波,谭郁松.基于频数的孤立点检测研究[J].计算机技术与发展,2013,(05):10-13.
 ZHU Dong-sheng,WU Qing-bo,TAN Yu-song.Research on Frequency-based Outlier Mining[J].,2013,(05):10-13.
点击复制

基于频数的孤立点检测研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2013年05期
页码:
10-13
栏目:
智能、算法、系统工程
出版日期:
1900-01-01

文章信息/Info

Title:
Research on Frequency-based Outlier Mining
文章编号:
1673-629X(2013)05-0010-04
作者:
朱东生吴庆波谭郁松
国防科学技术大学 计算机学院
Author(s):
ZHU Dong-shengWU Qing-boTAN Yu-song
关键词:
孤立点检测频繁项集基于距离Greenplum
Keywords:
outlier detectionfrequent itemsetsdistance-basedGreenplum
文献标志码:
A
摘要:
基于距离的孤立点检测算法在很多领域都有重要应用,效率不高却限制了孤立点检测算法的广泛应用.针对这个问题,文中通过分析基于索引的检测算法和基于单元的分析算法,受频繁项集挖掘算法的启发,应用统计学原理,提出了改进的基于距离的孤立点检测算法(Index Unit Based-on-Distance Outlier Mining,IU-BDOM).在待挖掘数据集中,出现次数越少的项集越可能是孤立点,即频数越低越可能是孤立点,本算法在检测孤立点时,从频数最小的项开始检测,从而节省了挖掘频数很高的肯定不是孤立点的数据所带来的开销.为了进一步加快速度,实现算法的并行性,采用超立方体代替传统的超球体来统计数据集中每一个对象o的邻居个数,把距离的计算分散到不同的维度上独立进行,并且给予不同维度不同的权重,此外,利用Greenpulm分布式数据库,并行了挖掘任务,极大地提高了挖掘效率,并通过实验证实了这种改进的有效性
Abstract:
Distance-based outlier detection algorithm in many fields has important applications,but the efficiency is not high which limit-ed the widely used outlier detection algorithms. For this problem,through analysis of the index detection algorithm and cell-based analysis algorithms,inspired by frequent itemsets mining algorithm,using statistical principles,proposed an improved distance-based outlier detec-tion algorithm (Index Unit Based-on-Distance Outlier Mining,IU-BDOM). Data to be excavated concentrated,appears more times the more less of the item sets may be an outliers,i. e. the frequency is the more low,the more likely is an outliers. The present algorithm in the detection of the outliers,from the frequency of the minimum of the items start detection,thereby saving the excavation frequency num-ber overhead high certainly not an outliers. In order to further accelerate the speed and realize the parallelism of the algorithm,the number of neighbors used the hypersphere to statistics hypercubes instead of the traditional centralized each object o,the distance independently calculated dispersed into different dimensions,and give different weights to different dimensions,in addition,the use of distributed data-base of Greenpulm,parallel mining tasks and greatly improve the efficiency of mining,and the effectiveness of such an improved is con-firmed by experiment

相似文献/References:

[1]李雷 丁亚丽 罗红旗.基于规则约束制导的入侵检测研究[J].计算机技术与发展,2010,(03):143.
 LI Lei,DING Ya-li,LUO Hong-qi.Intrusion Detection Technology Research Based on Homing - Constraint Rule[J].,2010,(05):143.
[2]张笑达 徐立臻.一种改进的基于矩阵的频繁项集挖掘算法[J].计算机技术与发展,2010,(04):93.
 ZHANG Xiao-da,XU Li-zhen.An Advanced Frequent Itemsets Mining Algorithm Based on Matrix[J].,2010,(05):93.
[3]王爱平 王占凤 陶嗣干 燕飞飞.数据挖掘中常用关联规则挖掘算法[J].计算机技术与发展,2010,(04):105.
 WANG Ai-ping,WANG Zhan-feng,TAO Si-gan,et al.Common Algorithms of Association Rules Mining in Data Mining[J].,2010,(05):105.
[4]张广路 雷景生 吴兴惠.一种改进的Apriori关联规则挖掘算法(英文)[J].计算机技术与发展,2010,(06):84.
 ZHANG Guang-lu,LEI Jing-sheng,WU Xing-hui.An Improved Apriori Algorithm for Mining Association Rules[J].,2010,(05):84.
[5]陈伟.Apriori算法的优化方法[J].计算机技术与发展,2009,(06):80.
 CHEN Wei.Method of Apriori Algorithm Optimization[J].,2009,(05):80.
[6]王伟 高亮 吴涛.基于遗传算法的长频繁项集挖掘方法[J].计算机技术与发展,2008,(04):19.
 WANG Wei,GAO Liang,WU Tao.A Method of Mining Long Frequent Itemset Based on Genetic Algorithm[J].,2008,(05):19.
[7]吴春阳 何友全.数据挖掘技术及其在旅游线路规划系统的应用[J].计算机技术与发展,2008,(09):235.
 WU Chun-yang,HE You-quan.Application of Association Rule in Data Mining for Tour Planning[J].,2008,(05):235.
[8]荣秋生 颜君彪.网格下最大频繁项集挖掘算法的实现[J].计算机技术与发展,2007,(01):98.
 RONG Qiu-sheng,YAN Jun-biao.Implementation of Maximal Frequent Itemset Data Mining Based on Grid[J].,2007,(05):98.
[9]冯洁 陶宏才.典型关联规则挖掘算法的分析与比较[J].计算机技术与发展,2007,(03):121.
 FENG Jie,TAO Hong-cai.Analysis and Comparison of Representative Algorithms for Mining Association Rules[J].,2007,(05):121.
[10]程玉胜 邓小光 江效尧.Apriori算法中频繁项集挖掘实现研究[J].计算机技术与发展,2006,(03):58.
 CHENG Yu-sheng,DENG Xiao-guang,JIANG Xiao-yao.Realization of Mining Frequent Itemsets Based on Apriori[J].,2006,(05):58.

更新日期/Last Update: 1900-01-01