[1]张仪,陈国,张再跃. 可增量的用户短文本聚类方法研究[J].计算机技术与发展,2017,27(11):83-87.
 ZHANG Yi,CHEN Guo,ZHANG Zai-yue. Research on Scalable Clustering of User-oriented Short Text[J].,2017,27(11):83-87.
点击复制

 可增量的用户短文本聚类方法研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
27
期数:
2017年11期
页码:
83-87
栏目:
智能、算法、系统工程
出版日期:
2017-11-10

文章信息/Info

Title:
 Research on Scalable Clustering of User-oriented Short Text
文章编号:
1673-629X(2017)11-0083-05
作者:
 张仪陈国张再跃
 江苏科技大学 计算机科学与工程学院
Author(s):
 ZHANG YiCHEN GuoZHANG Zai-yue
关键词:
 短文本语义归一化离线聚类在线聚类
Keywords:
short textsemantic normalizationoffline clusteringonline clustering
分类号:
TP301
文献标志码:
A
摘要:
 随着大数据时代的到来,用户短文本数据呈爆炸性增长,充分利用聚类分析技术获取短文本中的有用信息显得十分重要.聚类分析作为一种重要的知识发现手段,是将对象按其特征的相似程度进行归类的过程.为此,提出了一种可增量面向用户短文本聚类方法.该方法包括离线聚类和在线聚类两大类,前者在短文本预处理的基础上,利用无关语词典对短文本中的无关语进行识别和清理,再利用词类词典对短文本进行语义归一化;同时还提出了基于多特征融合的相似度计算方法,以实现对文本的相关性聚类.后者则以离线聚类结果为特征,对在线文本进行在线聚类操作,将离线聚类结果和在线聚类结果进行合并,以生成最终的聚类结果.为验证该方法的有效性与可行性,与基于特征向量的相似度方法进行了对比实验.实验结果表明,该方法的聚类召回率可达73%,聚类精度达到87.7%,F值为79.6%,均优于基于特征向量的方法.
Abstract:
 With the advent of big data time,data of user short text has growing explosively. Acquisition of useful information from short text with clustering analysis technology is becoming most important. Clustering analysis,as a crucial means of knowledge discovery,is the process of classifying the objects according to their similarity degree of characteristics. Therefore,a scalable clustering method of user-ori-ented short text is proposed,which is composed of two phases,offline clustering and online clustering. The short text is pre-processed by recognizing and removing irrelevant words with irrelevant words dictionary and normalizing semantics with parts of speech dictionary in offline clustering. A similarity calculation method is proposed based on fusion of mutli-features to conduct correlation clustering on text. Then in the online clustering,the online texts are clustered via taken results of offline clustering as features. Results of clustering are pro-duced by integration of the results from offline clustering with those of online clustering. In order to verify its effectiveness and feasibility, the contrast experiments are conducted. Experimental results show that it has achieved recall rate in clustering by 73%,clustering accuracy by 87. 7% and value of F-measure by 79. 6%,which is superior to feature vector method.

相似文献/References:

[1]赵小谦 郑彦 储海庆.概念树在短文本语义相似度上的应用[J].计算机技术与发展,2012,(06):159.
 ZHAO Xiao-qian,ZHENG Yan,CHU Hai-qing.Application of Concept Tree in Semantic Similarity of Short Texts[J].,2012,(11):159.
[2]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(11):1.
[3]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(11):5.
[4]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(11):13.
[5]侯善江[],张代远[][][]. 基于样条权函数神经网络P2P流量识别方法[J].计算机技术与发展,2014,24(07):21.
 HOU Shan-jiang[],ZHANG Dai-yuan[][][]. P2P Traffic Identification Based on Spline Weight Function Neural Network[J].,2014,24(11):21.
[6]李璨,耿国华,李康,等. 一种基于三维模型的文物碎片线图生成方法[J].计算机技术与发展,2014,24(07):25.
 LI Can,GENG Guo-hua,LI Kang,et al. A Method of Obtaining Cultural Debris’ s Line Chart Based on Three-dimensional Model[J].,2014,24(11):25.
[7]翁鹤,皮德常. 混沌RBF神经网络异常检测算法[J].计算机技术与发展,2014,24(07):29.
 WENG He,PI De-chang. Chaotic RBF Neural Network Anomaly Detection Algorithm[J].,2014,24(11):29.
[8]刘茜[],荆晓远[],李文倩[],等. 基于流形学习的正交稀疏保留投影[J].计算机技术与发展,2014,24(07):34.
 LIU Qian[],JING Xiao-yuan[,LI Wen-qian[],et al. Orthogonal Sparsity Preserving Projections Based on Manifold Learning[J].,2014,24(11):34.
[9]尚福华,李想,巩淼. 基于模糊框架-产生式知识表示及推理研究[J].计算机技术与发展,2014,24(07):38.
 SHANG Fu-hua,LI Xiang,GONG Miao. Research on Knowledge Representation and Inference Based on Fuzzy Framework-production[J].,2014,24(11):38.
[10]叶偲,李良福,肖樟树. 一种去除运动目标重影的图像镶嵌方法研究[J].计算机技术与发展,2014,24(07):43.
 YE Si,LI Liang-fu,XIAO Zhang-shu. Research of an Image Mosaic Method for Removing Ghost of Moving Targets[J].,2014,24(11):43.
[11]苏小英[][],孟环建[]. 基于神经网络的微博情感分析[J].计算机技术与发展,2015,25(12):161.
 SU Xiao-ying[][],MENG Huan-jian[]. Sentiment Analysis of Micro-blog Based on Neural Networks[J].,2015,25(11):161.

更新日期/Last Update: 2017-12-26