[1]刘红兵[],李文坤[],张仰森[]. 基于LDA模型和多层聚类的微博话题检测[J].计算机技术与发展,2016,26(06):25-30.
 LIU Hong-bing[],LI Wen-kun[],ZHANG Yang-sen[]. Microblog Topic Detection Based on LDA Model and Multi-level Clustering[J].,2016,26(06):25-30.
点击复制

 基于LDA模型和多层聚类的微博话题检测()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
26
期数:
2016年06期
页码:
25-30
栏目:
智能、算法、系统工程
出版日期:
2016-06-10

文章信息/Info

Title:
 Microblog Topic Detection Based on LDA Model and Multi-level Clustering
文章编号:
1673-629X(2016)06-0025-06
作者:
 刘红兵[1] 李文坤[2] 张仰森[2]
 1.太原科技大学 电子信息学院;2.北京信息科技大学 智能信息处理研究所
Author(s):
 LIU Hong-bing[1] LI Wen-kun[2] ZHANG Yang-sen[2]
关键词:
 LDA模型话题检测改进的Single-Pass聚类层次聚类
Keywords:
 LDA modeltopic detectionimproved Single-Pass clusteringhierarchical clustering
分类号:
TP391
文献标志码:
A
摘要:
 随着微博这一新兴社交媒体的广泛应用,以微博为背景的相关研究不断涌现,其中基于微博的话题检测是当前研究的热点之一。结合微博文本的相关特点,文中提出了一种基于LDA模型和多层聚类的微博话题检测方法。首先,通过LDA模型对微博数据建模并提取特征;其次,利用改进的Single-Pass聚类和层次聚类对微博数据进行聚类,从而发现热点话题。通过在大规模微博数据上进行话题检测实验,通过LDA建模比通过TF-IDF进行特征选择和权重计算效果好;改进的Single-Pass聚类能够处理第一遍Single-Pass聚类未处理的微博,提高了初步聚类的精度,并且为下一步层次聚类减少了时间;多层聚类的聚类效果在准确率、召回率和F值三方面均比单一聚类算法的聚类效果好。显然,文中的话题检测方法是可行的,也是有效的。
Abstract:
 With the wide application of microblog,emerging social media,relevant research is being emerged on microblog. The topic de-tection based on microblog is one of the hotspots in current research. In combination with the relevant characteristics of microblog,a mi-croblog topic detection based on LDA model and hierarchical clustering is proposed. First,LDA model is applied for modeling and feature extraction to microblog data. Then,the improved Single-Pass clustering and hierarchical clustering is used on microblog data clustering and the hot topic is found. Experiment on large-scale corpus shows that it is more effective through the LDA model than by TF-IDF for feature selection and weight calculation;the improved Single-Pass clustering can deal with the untreated microblog by the first Single-Pass clustering,which can improve the accuracy of the initial clustering and reduce the time of hierarchical clustering;it is more effective through the hierarchical clustering than the single clustering in accuracy,recall and F -value. Clearly,it is feasible and effective by the LDA model and multi-level clustering to detect the microblog topic.

相似文献/References:

[1]杨星 李保利 金明举.基于LDA模型的研究领域热点及趋势分析[J].计算机技术与发展,2012,(10):66.
 YANG Xing,Ll Bao-li,JIN Ming-ju.LDA-based Research Domain Hotspots and Trend Analysis[J].,2012,(06):66.
[2]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(06):1.
[3]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(06):5.
[4]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(06):13.
[5]侯善江[],张代远[][][]. 基于样条权函数神经网络P2P流量识别方法[J].计算机技术与发展,2014,24(07):21.
 HOU Shan-jiang[],ZHANG Dai-yuan[][][]. P2P Traffic Identification Based on Spline Weight Function Neural Network[J].,2014,24(06):21.
[6]李璨,耿国华,李康,等. 一种基于三维模型的文物碎片线图生成方法[J].计算机技术与发展,2014,24(07):25.
 LI Can,GENG Guo-hua,LI Kang,et al. A Method of Obtaining Cultural Debris’ s Line Chart Based on Three-dimensional Model[J].,2014,24(06):25.
[7]翁鹤,皮德常. 混沌RBF神经网络异常检测算法[J].计算机技术与发展,2014,24(07):29.
 WENG He,PI De-chang. Chaotic RBF Neural Network Anomaly Detection Algorithm[J].,2014,24(06):29.
[8]刘茜[],荆晓远[],李文倩[],等. 基于流形学习的正交稀疏保留投影[J].计算机技术与发展,2014,24(07):34.
 LIU Qian[],JING Xiao-yuan[,LI Wen-qian[],et al. Orthogonal Sparsity Preserving Projections Based on Manifold Learning[J].,2014,24(06):34.
[9]尚福华,李想,巩淼. 基于模糊框架-产生式知识表示及推理研究[J].计算机技术与发展,2014,24(07):38.
 SHANG Fu-hua,LI Xiang,GONG Miao. Research on Knowledge Representation and Inference Based on Fuzzy Framework-production[J].,2014,24(06):38.
[10]叶偲,李良福,肖樟树. 一种去除运动目标重影的图像镶嵌方法研究[J].计算机技术与发展,2014,24(07):43.
 YE Si,LI Liang-fu,XIAO Zhang-shu. Research of an Image Mosaic Method for Removing Ghost of Moving Targets[J].,2014,24(06):43.
[11]陈攀[],杨浩[],吕品[][],等. 基于LDA模型的文本相似度研究[J].计算机技术与发展,2016,26(04):82.
 CHEN Pan[],YANG Hao[],L Pin[][],et al. Study on Text Similarity Based on LDA Model[J].,2016,26(06):82.
[12]倪丽萍,刘小军,马驰宇. 基于LDA模型和AP聚类的主题演化分析[J].计算机技术与发展,2016,26(12):6.
 NI Li-ping,LIU Xiao-jun,MA Chi-yu. Topic Evolution Analysis Based on LDA Model and AP Clustering[J].,2016,26(06):6.
[13]李越,曹菡. 基于美食互动社区的用户饮食行为模型研究[J].计算机技术与发展,2016,26(12):156.
 LI Yue,CAO Han. Research on User Eating Behavior Model Based on Food Interactive Community[J].,2016,26(06):156.

更新日期/Last Update: 2016-09-19