[1]吴云龙,李玲娟.基于Spark的模糊聚类算法实现及其应用[J].计算机技术与发展,2019,29(01):130-134.[doi:10. 3969 / j. issn. 1673-629X. 2019. 01. 027]
 WU Yun-long,LI Ling-juan.Implementation and Application of Fuzzy ClusteringAlgorithm Based on Spark[J].,2019,29(01):130-134.[doi:10. 3969 / j. issn. 1673-629X. 2019. 01. 027]
点击复制

基于Spark的模糊聚类算法实现及其应用()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
29
期数:
2019年01期
页码:
130-134
栏目:
安全与防范
出版日期:
2019-01-10

文章信息/Info

Title:
Implementation and Application of Fuzzy ClusteringAlgorithm Based on Spark
文章编号:
1673-629X(2019)01-0130-05
作者:
吴云龙 李玲娟
南京邮电大学 计算机学院,江苏 南京,210023
Author(s):
WU Yun-longLI Ling-juan
School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
关键词:
聚类分析 模糊C均值 Spark 入侵检测
Keywords:
cluster analysisfuzzy c-meansSparkintrusion detection
分类号:
TP39
DOI:
10. 3969 / j. issn. 1673-629X. 2019. 01. 027
摘要:
作为软聚类的代表性算法,模糊聚类算法FCM能客观地处理带有模糊性的聚类问题.为了适应对大数据进行实时和准确地聚类的需求,提高FCM算法对大数据的聚类效率,设计了FCM基于大数据计算平台Spark的并行化实现方法.该方案用HDFS对底层数据进行分布式存储,用RDD机制进行计算过程中的数据转换,用持久化技术实现中间结果的重用.为了检验所设计的并行化FCM算法的有效性,将其应用于入侵检测系统中,首先对KDD CUP 99数据集进行预处理,然后分别在单机和Spark集群上,针对预处理前后的KDD CUP 99数据集,用该算法实施聚类来检测入侵,并比较检测的准确性和时效性.应用结果表明,基于Spark的并行化FCM算法有良好的聚类鲁棒性、收敛速度和准确率,尤其在处理大规模样本数据时,优势更加明显.
Abstract:
As a typical soft clustering algorithm,fuzzy c-means (FCM) can deal with the clustering problem with fuzziness objectively.In order to adapt to the need for real-time and accurate clustering of big data and improve the clustering efficiency of FCM algorithm forbig data,we design a parallel implementation method of FCM based on Spark,a big data computing platform. HDFS is used to store theunderlying data,RDD is used for realizing data conversion in the computing process,and persistence technology for the reuse of interme-diate results. To test the effectiveness of the designed parallel FCM,it is applied in the intrusion detection system. First KDD CUP 99 da-ta set is preprocessed,and then intrusions are detected by using the algorithm to cluster KDD CUP 99 data sets before and after pretreat-ment respectively and on the single machine and Spark cluster respectively. In addition,the accuracy and timeliness of the detection arecompared. The application results show that the parallel FCM algorithm based on Spark has better clustering robustness,convergencespeed and accuracy,especially more significant advantages when dealing with large sample data.

相似文献/References:

[1]项响琴 汪彩梅.基于聚类高维空间算法的离群数据挖掘技术研究[J].计算机技术与发展,2010,(01):120.
 XIANG Xiang-qin,WANG Cai-mei.Study of Outlier Data Mining Based on CLIQUE Algorithm[J].,2010,(01):120.
[2]查文琴 梁昌勇 曹镭.基于用户聚类的协同过滤推荐方法[J].计算机技术与发展,2009,(06):69.
 ZHA Wen-qin,LIANG Chang-yong,CAO Lei.Collaborative Filtering Recommendation Method Based on Clustering of Users[J].,2009,(01):69.
[3]狄明明 孙德山.聚类分析和支持向量机在股票研究中的应用[J].计算机技术与发展,2009,(06):229.
 DI Ming-ming,SUN De-shan.Applications of Cluster Analysis and Support Vector Machines to Stock Research[J].,2009,(01):229.
[4]李雷 罗红旗 丁亚丽.一种改进的模糊C均值聚类算法[J].计算机技术与发展,2009,(12):71.
 LI Lei,LUO Hong-qi,DING Ya-li.A Novel FCM Clustering Algorithm[J].,2009,(01):71.
[5]李丽芳 周鸣争.一种基于构造性核覆盖的聚类算法[J].计算机技术与发展,2009,(01):88.
 LI Li-fang,ZHOU Ming-zheng.A Clustering Algorithm Based on Constructive Kernel Covering Algorithm[J].,2009,(01):88.
[6]朱桂宏 王刚.基于数据流的网络入侵检测研究[J].计算机技术与发展,2009,(03):175.
 ZHU Gui-hong,WANG Gang.Research on Network Intrusion Detection Based on Data Stream[J].,2009,(01):175.
[7]罗世谦 冯子亮 张恒.一种基于能量聚类分析的句子语音端点检测法[J].计算机技术与发展,2008,(04):13.
 LUO Shi-qian,FENG Zi-liang,ZHANG Heng.A Sentential Endpoint Detection Algorithm Based on Energy Eigenvalue and Clustering Analysis[J].,2008,(01):13.
[8]谢铮桂 韦玉科 钟少丹.基于径向基神经网络用于中医舌诊诊断的研究[J].计算机技术与发展,2008,(09):242.
 XIE Zheng-gui,WEI Yu-ke,ZHONG Shao-dan.Research of RBF Neural Networks Based on Clustering Analysis in TCM Inspection of Tongue Diagnosis[J].,2008,(01):242.
[9]徐仰彬 刘志镜.基于DBSCAN的簇共享对象的处理办法[J].计算机技术与发展,2007,(07):38.
 XU Yang-bin,LIU Zhi-jing.A DBSCAN - Based Algorithm for Boundary Object of Cluster[J].,2007,(01):38.
[10]朱建平 曾玉钰.基于属性重要性的定性数据聚类分析及应用[J].计算机技术与发展,2007,(12):89.
 ZHU Jian-ping,ZENG Yu-yu.Analysis and Application of Qualitative Data Clustering Approach Based on Attribute Importance[J].,2007,(01):89.

更新日期/Last Update: 2019-01-10