[1]蔡妮明 王翰虎 陈梅.一种基于滑动窗口的流数据聚类算法[J].计算机技术与发展,2011,(01):23-26.
 CAI Ni-ming,WANG Han-hu,CHEN Mei.A New Streaming Data Cluster Algorithm Based on Sliding Window[J].,2011,(01):23-26.
点击复制

一种基于滑动窗口的流数据聚类算法
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2011年01期
页码:
23-26
栏目:
智能、算法、系统工程
出版日期:
1900-01-01

文章信息/Info

Title:
A New Streaming Data Cluster Algorithm Based on Sliding Window
文章编号:
1673-629X(2011)01-0023-04
作者:
蔡妮明 王翰虎 陈梅
贵州大学计算机科学与信息学院
Author(s):
CAI Ni-ming WANG Han-hu CHEN Mei
Computer Science and Information School, Guizhou University
关键词:
流数据聚类滑动窗口改进的k-means算法
Keywords:
data stream clustering sliding window improved k-means algorithm
分类号:
TP311
文献标志码:
A
摘要:
在实际应用中,人们往往比较关心最近一段时间内数据流的分布状况。在传统的基于界标模型的聚类算法CluStream中,没有淘汰过期元组,不能准确反映当前数据流的数据分布状况。滑动窗口是数据流中一种关注近期数据的近似方法。为了提高对流数据聚类分析的质量及效率,对算法CluStream进行了改进,采用滑动窗口来支持数据处理。为了减少聚类操作中每次迭代的计算次数,算法采用改进的k-means来执行聚类操作。优化后的算法能及时淘汰过期元组,同时对新到达的元组不断进行实时处理,可以获得更准确的分析结果。与聚类算法CluStream相比,优化算法可获得较小的内存开销和快速的数据处理能力,聚类结果更合理清晰
Abstract:
Data stream in the most recent distribution of the more often a cause for concern. CluStream algorithm is a traditional landmark -based model of the clustering algorithm which does not eliminate expired tuples. We cannot accurately reflect the current data distribution of the data stream. Sliding window is an approximate method which is concerned about the recent data in the data stream. In order to improve the quality and efficiency of the analysis of data stream clustering, have proposed an improved algorithm on the base of CluStream algorithm in this paper. Sliding window is used to support the data processing. In order to reduce the number of the ealculation in the clustering operation, the algorithm use improved k-means clustering to perform the operation. The opthnized algorithm can eliminate the expired tuplcs in time, while the new arrived tuples can be processed in real time. Through this way, can obtain a more accurate a- nalysis result. Compared with clustering algorithm CluStream, optimization algorithm can obtain less memory overhead and faster data- processing capacity. So that, the outcome of clustering analysis can become much more reasonable and clear

相似文献/References:

[1]蒋璐璐 王适 王宝成 李慧敏 李鑫慧.一种改进的标记分水岭遥感图像分割方法[J].计算机技术与发展,2010,(01):36.
 JIANG Lu-lu,WANG Shi,WANG Bao-cheng,et al.Segmentation of Remote Sensing Image Based on an Improved Labeling Watershed Algorithm[J].,2010,(01):36.
[2]张甜 罗眉 孟晓红 赵宗涛.一种基于状态特征的航天发射故障诊断技术[J].计算机技术与发展,2010,(01):93.
 ZHANG Tian,LUO Mei,MENG Xiao-hong,et al.A Technology in Fault Diagnosis of Spaceflight Launch Based on State Character[J].,2010,(01):93.
[3]王会颖 章义刚.求解聚类问题的改进人工鱼群算法[J].计算机技术与发展,2010,(03):84.
 WANG Hui-ying,ZHANG Yi-gang.An Improved Artificial Fish- Swarm Algorithm of Solving Clustering Analysis Problem[J].,2010,(01):84.
[4]赵敏 倪志伟 刘斌.K—means与朴素贝叶斯在商务智能中的应用[J].计算机技术与发展,2010,(04):179.
 ZHAO Min,NI Zhi-wei,LIU Bin.Application Research of K - Means Clustering and Naive Bayesian Algorithm in Business Intelligence[J].,2010,(01):179.
[5]吴楠 胡学钢.基于聚类分区的序列模式挖掘算法研究[J].计算机技术与发展,2010,(06):109.
 WU Nan,HU Xue-gang.Research on Clustering Partition-Based Approach of Sequential Pattern Mining[J].,2010,(01):109.
[6]耿波 仲红 徐杰 闫娜娜.用关联分析法对负荷预测结果进行二次处理[J].计算机技术与发展,2008,(04):171.
 GENG Bo,ZHONG Hong,XU Jie,et al.Using Correlation Analysis to Treat Load Forecasting Results[J].,2008,(01):171.
[7]游芳 姜建国 张坤.基于二维属性的高维数据聚类算法研究[J].计算机技术与发展,2009,(05):111.
 YOU Fang,JIANG Jian-guo,ZHANG Kun.Cluster- Algorithm Studies Based on Two- Dimensional Attribute Higher - Dimension Data[J].,2009,(01):111.
[8]刘淑英 程国建 彭方.人工神经生长细胞结构网络在医疗诊断的应用[J].计算机技术与发展,2009,(05):231.
 LIU Shu-ying,CHENG Guo-jian,PENG Fang.Applications of Growing Cell Structures of Artificial Neural Network for Medical Diagnosis[J].,2009,(01):231.
[9]范新 沈闻 丁泉勋 沈洁.基于正例和未标文档的半监督分类研究[J].计算机技术与发展,2009,(06):58.
 FAN Xin,SHEN Wen,DING Quan-xun,et al.Research on Semi- Supervised Classification Based on Positive and Unlabeled Text Document[J].,2009,(01):58.
[10]王园园 倪志伟 赵裕啸 伍章俊.基于决策树的模糊聚类评价算法及其应用[J].计算机技术与发展,2009,(09):232.
 WANG Yuan-yuan,NI Zhi-wei,ZHAO Yu-xiao,et al.Fuzzy Clustering Evaluation Algorithm Based on Decision Tree and Application[J].,2009,(01):232.

备注/Memo

备注/Memo:
贵州省科技计划工业攻关基金资助项目(黔科合GY字[2008]3035)蔡妮明(1985-),女,硕士研究生,CCF会员,研究方向为数据库技术与软件工程;王翰虎,教授,CCF高级会员,研究方向为数据库技术和分布式系统;陈梅,副教授,研究方向为数据库技术与软件工程
更新日期/Last Update: 1900-01-01