«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j. issn. 1673-629X. 2020. 07. 015]
点击复制

基于医疗数据的聚类挖掘策略研究()

分享到：

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 30
期数:: 2020年07期

页码:: 66-70

栏目:: 智能、算法、系统工程

出版日期:: 2020-07-10

文章信息/Info

Title:: Research on Clustering Mining Strategy Based on Medical Data Sets

文章编号:: 1673-629X(2020)07-0066-05

作者:: 王艳娥¹ ; 安健² ; 王红刚¹; 丁心安¹; 杨倩¹; 1. 西安思源学院理工学院,陕西西安 710038; 2. 西安交通大学深圳研究院,广东深圳 518057

Author(s):: WANG Yan-e¹; AN Jian²; WANG Hong-gang¹; DING Xin-an¹; YANG Qian¹; 1. School of Science and Technology,Xi’an Siyuan University,Xi’an 710038,China; 2. Shenzhen Research Institute of Xi’an Jiaotong University,Shenzhen 518057,China

关键词:: 医疗数据; K-medoids 算法; 聚类; 密度优化; 方差

Keywords:: medical data; K-medoids algorithm; clustering; density optimization; variance

分类号:: TP311. 5

DOI:: 10. 3969 / j. issn. 1673-629X. 2020. 07. 015

摘要:: 基于医疗数据集,研究划分式聚类算法 K-medoids。针对该算法随机选取初始聚类中心、收敛速度慢、聚类结果不稳定等问题,提出基于方差的密度优化算法。该算法以样本集的均方差和距离均值为基础,再根据样本集的大小计算样本集的密度半径,在相同密度半径下稠密区域的样本具有较高的密度,通过动态选择不同高密度区域的样本作为初始聚类中心,在进行聚类的过程中通过局部优化,加快收敛速度,解决传统 K-medoids 存在的缺点。将该优化算法应用在 UCI机器学习的医疗数据集上测试聚类效果,实验验证该算法选择的初始聚类中心位于样本集的稠密区域,更符合数据集的原始分布,且在乳腺癌数据集上具有较高的聚类准确率,聚类结果稳定,收敛速度快。

Abstract:: Based on the medical data set, the partitioning clustering algorithm K - medoids is studied. A variance-based density optimization algorithm is proposed to solve the problems of random selection of initial clustering center,slow convergence speed and unstable clustering results in K-medoids algorithm. Based on the mean square deviation and distance mean of the sample set,the density radius of the sample set is calculated according to the size of the sample set. Samples in the dense region with the same density radius have higher density. By dynamically selecting the samples as initial clustering centers from different dense regions,local optimization is adopted in the clustering process to accelerate the convergence speed,so as to solve the shortcomings of traditional K-medoids. In order to test the clustering effect,this algorithm is applied to medical data set of UCI machine learning. The experiment shows that the initial clustering centers selected by the algorithm are located in the dense area of the sample set, which is more in line with the original distribution of the data set. The algorithm has higher clustering accuracy,more stable clustering results and faster convergence speed on breast cancer data sets.

相似文献/References:

[1]张晓滨,母玉雪.改进的方差优化初始中心的 K-medoids 算法[J].计算机技术与发展,2020,30(07):42.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 010]
　ZHANG Xiao-bin,MU Yu-xue.An Improved K-medoids Algorithm for Initial Center of Variance Optimization[J].,2020,30(07):42.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 010]
[2]陈春玲*,姜慧敏,郭永安.基于两阶段特征选择的医疗敏感文本分类[J].计算机技术与发展,2020,30(08):129.[doi:10. 3969 / j. issn. 1673-629X. 2020. 08. 022]
　CHEN Chun-ling*,JIANG Hui-min,GUO Yong-an.Medical Sensitive Text Classification Based on Two-stage Feature Selection[J].,2020,30(07):129.[doi:10. 3969 / j. issn. 1673-629X. 2020. 08. 022]
[3]朱诗生,李朝清,黄仁俊,等.基于区块链的医疗数据安全共享模型与机制[J].计算机技术与发展,2020,30(10):123.[doi:10. 3969 / j. issn. 1673-629X. 2020. 10. 023]
　ZHU Shi-sheng,LI Chao-qing,HUANG Ren-jun,et al.Secure Sharing Model and Mechanism of Medical Data Based on Block Chain[J].,2020,30(07):123.[doi:10. 3969 / j. issn. 1673-629X. 2020. 10. 023]
[4]李晓峰,焦洪双,王妍玮.基于量子计算的医疗数据敏感度度量[J].计算机技术与发展,2021,31(01):187.[doi:10. 3969 / j. issn. 1673-629X. 2021. 01. 033]
　LI Xiao-feng,JIAO Hong-shuang,WANG Yan-wei.Sensitivity Measurement of Medical Data Based on Quantum Computing[J].,2021,31(07):187.[doi:10. 3969 / j. issn. 1673-629X. 2021. 01. 033]

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed1241
全文下载/Downloads579
评论/Comments