«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

[1]高瑜[][],田丰[],吴振强[][]. 基于差分隐私保护的DPk-medoids聚类算法[J].计算机技术与发展,2017,27(10):117-120.
　GAO Yu[][],TIAN Feng[],WU Zhen-qiang[][]. A DPk-medoids Clustering Algorithm with Differential Privacy Protection[J].,2017,27(10):117-120.
点击复制

基于差分隐私保护的DPk-medoids聚类算法()

分享到：

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 27
期数:: 2017年10期

页码:: 117-120

栏目:: 安全与防范

出版日期:: 2017-10-10

文章信息/Info

Title:: A DPk-medoids Clustering Algorithm with Differential Privacy Protection

文章编号:: 1673-629X（2017)10-0117-04

作者:: 高瑜[1][2]; 田丰[2]; 吴振强[1][2]; 1.现代教学技术教育部重点实验室；2.陕西师范大学计算机科学学院

Author(s):: GAO Yu[1][2]; TIAN Feng[2]; WU Zhen-qiang[1][2]

关键词:: ; 数据挖掘; 隐私保护; 差分隐私; k-中心性聚类

Keywords:: ; data mining; privacy preserving; differential privacy; k-medoids clustering

分类号:: TP309.2

文献标志码:: A

摘要:: 聚类分析是数据挖掘中的一个重要研究领域,由于聚类分析能够发现数据的内在结构并对数据进行更深入的分析或预处理,因此被用于图像处理、模式识别等诸多领域中.若用户数据被一些持有大数据集的组织(如医疗机构)利用挖掘工具获取个人隐私,将可能导致用户敏感信息面临泄露的威胁.为此,结合差分隐私的特性,提出了一种基于差分隐私保护的DPk-medoids聚类算法.该算法在每次发布真实中心点之前使用拉普拉斯机制对中心点加噪,再发布加噪之后的中心点,在一定程度上保证了个人隐私的安全性,以及聚类的有效性.真实数据集上的仿真实验结果表明,提出的聚类算法可以适应规模、维数不同的数据集,当隐私预算达到一定值时,DPk-medoids聚类算法与原始聚类算法的有效性比率范围可达0.9~1之间.

Abstract:: Cluster analysis is one of the significant research fields in the data mining. Due to its paramount advantages in identification of the internal data structure and pretreatment/analysis of the data,it can be used in fields of the image processing and pattern recognition and so on. Users’ sensitive information could face leaking threats if mining tools are used to obtain the personal privacy by some organi-zations which own large datasets,such as medical companies. Therefore,taken into the characteristic of differential privacy account,a DPk-medoids algorithm based on differential privacy protection is proposed. It releases the noised center points before using Laplace mecha-nism to add noise,and in certain degree,personal privacy security and the effectiveness of clustering can be ensured. Experimental results with the ture datasets show that it can be applied to datasets with different scales and dimensions and moreover the range of effective ratio can reach to 0. 9~1 compared with original clustering algorithm when the privacy budget reaches a certain value.

相似文献/References:

[1]项响琴汪彩梅.基于聚类高维空间算法的离群数据挖掘技术研究[J].计算机技术与发展,2010,(01):120.
　XIANG Xiang-qin,WANG Cai-mei.Study of Outlier Data Mining Based on CLIQUE Algorithm[J].,2010,(10):120.
[2]李雷丁亚丽罗红旗.基于规则约束制导的入侵检测研究[J].计算机技术与发展,2010,(03):143.
　LI Lei,DING Ya-li,LUO Hong-qi.Intrusion Detection Technology Research Based on Homing - Constraint Rule[J].,2010,(10):143.
[3]吉同路柏永飞王立松.住宅与房地产电子政务中数据挖掘的应用研究[J].计算机技术与发展,2010,(01):235.
　JI Tong-lu,BAI Yong-fei,WANG Li-song.Study and Application of Data Mining in E-government of House and Real Estate Industry[J].,2010,(10):235.
[4]杨静张楠男李建刘延明梁美红.决策树算法的研究与应用[J].计算机技术与发展,2010,(02):114.
　YANG Jing,ZHANG Nan-nan,LI Jian,et al.Research and Application of Decision Tree Algorithm[J].,2010,(10):114.
[5]赵裕啸倪志伟王园园伍章俊.SQL Server 2005数据挖掘技术在证券客户忠诚度的应用[J].计算机技术与发展,2010,(02):229.
　ZHAO Yu-xiao,NI Zhi-wei,WANG Yuan-yuan,et al.Application of Data Mining Technology of SQL Server 2005 in Customer Loyalty Model in Securities Industry[J].,2010,(10):229.
[6]张笑达徐立臻.一种改进的基于矩阵的频繁项集挖掘算法[J].计算机技术与发展,2010,(04):93.
　ZHANG Xiao-da,XU Li-zhen.An Advanced Frequent Itemsets Mining Algorithm Based on Matrix[J].,2010,(10):93.
[7]王爱平王占凤陶嗣干燕飞飞.数据挖掘中常用关联规则挖掘算法[J].计算机技术与发展,2010,(04):105.
　WANG Ai-ping,WANG Zhan-feng,TAO Si-gan,et al.Common Algorithms of Association Rules Mining in Data Mining[J].,2010,(10):105.
[8]张广路雷景生吴兴惠.一种改进的Apriori关联规则挖掘算法（英文）[J].计算机技术与发展,2010,(06):84.
　ZHANG Guang-lu,LEI Jing-sheng,WU Xing-hui.An Improved Apriori Algorithm for Mining Association Rules[J].,2010,(10):84.
[9]吴楠胡学钢.基于聚类分区的序列模式挖掘算法研究[J].计算机技术与发展,2010,(06):109.
　WU Nan,HU Xue-gang.Research on Clustering Partition-Based Approach of Sequential Pattern Mining[J].,2010,(10):109.
[10]吴青傅秀芬.水平分布数据库的正负关联规则挖掘[J].计算机技术与发展,2010,(06):113.
　WU Qing,FU Xiu-fen.Positive and Negative Association Rules Mining on Horizontally Partitioned Database[J].,2010,(10):113.
[11]李蓉,周维柏. 基于多特征选取和类完全加权的入侵检测[J].计算机技术与发展,2014,24(07):145.
　LI Rong,ZHOU Wei-bai. Intrusion Detection Based on Multiple Feature Selection and Class Fully Weighted [J].,2014,24(10):145.
[12]占美星[],杨颖[],杨磊[]. 基于树结构多重最小支持度的挖掘算法研究[J].计算机技术与发展,2014,24(08):45.
　ZHAN Mei-xing[],YANG Ying[],YANG Lei[]. Study on Mining Algorithm Based on Tree Structure Multiple Minimum Supports[J].,2014,24(10):45.
[13]于海平[],林晓丽[],刘会超[]. 基于数据挖掘的移动广告个性化推荐研究[J].计算机技术与发展,2014,24(08):234.
　YU Hai-ping[],LIN Xiao-li[],LIU Hui-chao[]. Research of Mobile Internet Advertising Personalized Recommendation Based on Data Mining[J].,2014,24(10):234.
[14]孙媛,黄刚. 基于Hadoop平台的C4.5算法的分析与研究[J].计算机技术与发展,2014,24(11):83.
　SUN Yuan,HUANG Gang. Analysis and Study of C4 . 5 Algorithm Based on Hadoop Platform[J].,2014,24(10):83.
[15]牛永洁,薛苏琴. 基于PDFBox抽取学术论文信息的实现[J].计算机技术与发展,2014,24(12):61.
　NIU Yong-jie,XUE Su-qin. Realization of Extraction of Academic Papers Information Based on PDFBox[J].,2014,24(10):61.
[16]郑超,高茂庭,吴爱华. 基于RFID及其路径约束的生产检查流程控制[J].计算机技术与发展,2015,25(02):225.
　ZHENG Chao,GAO Mao-ting,WU Ai-hua. Production Testing Process Control Based on RFID with Path Constraint[J].,2015,25(10):225.
[17]顾伟[][],傅德胜[][],蔡玮[]. 基于命题逻辑的关联规则挖掘算法[J].计算机技术与发展,2015,25(03):91.
　GU Wei[][],FU De-sheng[][],CAI Wei[]. Association Rules Mining Algorithm Based on Propositional Logic[J].,2015,25(10):91.
[18]陈运文,吴飞,吴庐山,等. 基于异常检测的时间序列研究[J].计算机技术与发展,2015,25(04):166.
　CHEN Yun-wen,WU Fei,WU Lu-shan,et al. Research on Time Series Based on Anomaly Detection[J].,2015,25(10):166.
[19]王晓鹏,武彤. 生产质量控制数据仓库模型设计与实现[J].计算机技术与发展,2015,25(06):181.
　WANG Xiao-peng,WU Tong. Design and Realization of Data Warehouse Model on Production Quality Control[J].,2015,25(10):181.
[20]王玉雷,李玲娟. 一种密度和划分结合的聚类算法[J].计算机技术与发展,2015,25(09):53.
　WANG Yu-le,LI Ling-juan. A Clustering Algorithm of Combination of Density and Division[J].,2015,25(10):53.

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed1320
全文下载/Downloads1074
评论/Comments