[1]王艳娥,安 健,王红刚,等.基于医疗数据的聚类挖掘策略研究[J].计算机技术与发展,2020,30(07):66-70.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 015]
 WANG Yan-e,AN Jian,WANG Hong-gang,et al.Research on Clustering Mining Strategy Based on Medical Data Sets[J].,2020,30(07):66-70.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 015]
点击复制

基于医疗数据的聚类挖掘策略研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
30
期数:
2020年07期
页码:
66-70
栏目:
智能、算法、系统工程
出版日期:
2020-07-10

文章信息/Info

Title:
Research on Clustering Mining Strategy Based on Medical Data Sets
文章编号:
1673-629X(2020)07-0066-05
作者:
王艳娥1 安 健2 王红刚1 丁心安1 杨 倩1
1. 西安思源学院 理工学院,陕西 西安 710038; 2. 西安交通大学深圳研究院,广东 深圳 518057
Author(s):
WANG Yan-e1 AN Jian2 WANG Hong-gang1 DING Xin-an1 YANG Qian1
1. School of Science and Technology,Xi’an Siyuan University,Xi’an 710038,China; 2. Shenzhen Research Institute of Xi’an Jiaotong University,Shenzhen 518057,China
关键词:
医疗数据K-medoids 算法聚类密度优化方差
Keywords:
medical dataK-medoids algorithmclusteringdensity optimizationvariance
分类号:
TP311. 5
DOI:
10. 3969 / j. issn. 1673-629X. 2020. 07. 015
摘要:
基于医疗数据集,研究划分式聚类算法 K-medoids。 针对该算法随机选取初始聚类中心、收敛速度慢、聚类结果不稳定等问题,提出基于方差的密度优化算法。该算法以样本集的均方差和距离均值为基础,再根据样本集的大小计算样本集的密度半径,在相同密度半径下稠密区域的样本具有较高的密度,通过动态选择不同高密度区域的样本作为初始聚类中心,在进行聚类的过程中通过局部优化,加快收敛速度,解决传统 K-medoids 存在的缺点。 将该优化算法应用在 UCI机器学习的医疗数据集上测试聚类效果,实验验证该算法选择的初始聚类中心位于样本集的稠密区域,更符合数据集的原始分布,且在乳腺癌数据集上具有较高的聚类准确率,聚类结果稳定,收敛速度快。
Abstract:
Based on the medical data set, the partitioning clustering algorithm K - medoids is studied. A variance-based density optimization algorithm is proposed to solve the problems of random selection of initial clustering center,slow convergence speed and unstable clustering results in K-medoids algorithm. Based on the mean square deviation and distance mean of the sample set,the density radius of the sample set is calculated according to the size of the sample set. Samples in the dense region with the same density radius have higher density. By dynamically selecting the samples as initial clustering centers from different dense regions,local optimization is adopted in the clustering process to accelerate the convergence speed,so as to solve the shortcomings of traditional K-medoids. In order to test the clustering effect,this algorithm is applied to medical data set of UCI machine learning. The experiment shows that the initial clustering centers selected by the algorithm are located in the dense area of the sample set, which is more in line with the original distribution of the data set. The algorithm has higher clustering accuracy,more stable clustering results and faster convergence speed on breast cancer data sets.

相似文献/References:

[1]张晓滨,母玉雪.改进的方差优化初始中心的 K-medoids 算法[J].计算机技术与发展,2020,30(07):42.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 010]
 ZHANG Xiao-bin,MU Yu-xue.An Improved K-medoids Algorithm for Initial Center of Variance Optimization[J].,2020,30(07):42.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 010]
[2]陈春玲*,姜慧敏,郭永安.基于两阶段特征选择的医疗敏感文本分类[J].计算机技术与发展,2020,30(08):129.[doi:10. 3969 / j. issn. 1673-629X. 2020. 08. 022]
 CHEN Chun-ling*,JIANG Hui-min,GUO Yong-an.Medical Sensitive Text Classification Based on Two-stage Feature Selection[J].,2020,30(07):129.[doi:10. 3969 / j. issn. 1673-629X. 2020. 08. 022]
[3]朱诗生,李朝清,黄仁俊,等.基于区块链的医疗数据安全共享模型与机制[J].计算机技术与发展,2020,30(10):123.[doi:10. 3969 / j. issn. 1673-629X. 2020. 10. 023]
 ZHU Shi-sheng,LI Chao-qing,HUANG Ren-jun,et al.Secure Sharing Model and Mechanism of Medical Data Based on Block Chain[J].,2020,30(07):123.[doi:10. 3969 / j. issn. 1673-629X. 2020. 10. 023]
[4]李晓峰,焦洪双,王妍玮.基于量子计算的医疗数据敏感度度量[J].计算机技术与发展,2021,31(01):187.[doi:10. 3969 / j. issn. 1673-629X. 2021. 01. 033]
 LI Xiao-feng,JIAO Hong-shuang,WANG Yan-wei.Sensitivity Measurement of Medical Data Based on Quantum Computing[J].,2021,31(07):187.[doi:10. 3969 / j. issn. 1673-629X. 2021. 01. 033]

更新日期/Last Update: 2020-07-10