[1]胡永丽 龚沛曾.基于模糊C均值和改进的LSA的文档聚类研究[J].计算机技术与发展,2010,(12):126-129.
 HU Yong-li,GONG Pei-zeng.Document Clustering Research Based on Fuzzy C-Means and Improved Latent Semantic Analysis[J].,2010,(12):126-129.
点击复制

基于模糊C均值和改进的LSA的文档聚类研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2010年12期
页码:
126-129
栏目:
智能、算法、系统工程
出版日期:
1900-01-01

文章信息/Info

Title:
Document Clustering Research Based on Fuzzy C-Means and Improved Latent Semantic Analysis
文章编号:
1673-629X(2010)12-0126-04
作者:
胡永丽 龚沛曾
同济大学电子与信息工程学院计算机科学与技术系
Author(s):
HU Yong-liGONG Pei-zeng
Dept.of Computer Science and Technology,Electronics and Information Engineering College,Tongji University
关键词:
模糊C均值LSA文档聚类
Keywords:
fuzzy c-means LSA document clustering
分类号:
TP391.1
文献标志码:
A
摘要:
文中研究的是文档聚类的方法,即将给定文档集合中的文档进行分类,以达到准确聚类的目的。提出了一种将模糊C均值(FCM)和改进的LSA(Latent Semantic Analysis)相结合进行文档聚类的方法。采用改进的词语特征提取方法构建词-文档矩阵,对该词-文档矩阵进行奇异值分解,从传统的VSM向量空间中提取文本的潜在语义空间,进而将高维的文档向量映射为低维空间的语义向量,文档之间相似度的计算采用文档语义向量的余弦表示。然后采用模糊C均值根据上述计算文档相似度的结果对文档进行聚类。针对校园论坛中的文档数
Abstract:
This paper is focused on the methods of document clustering,that is to classify the documents in the document set so as to achieve the aim of accurate clustering.Proposed a method which combines the Fuzzy C-means with improved LSA to do document clustering.A new method of feature extraction was used to construct term-document matrix.Do singular value decomposition for the matrix,extract the document's latent semantic space from the traditional VSM vector space so as to change the document vector of high dimension to semantic vector of low dimension.Use cosine between the documents semantic vectors to present the similarity between documents.Then use Fuzzy C-means to do document clustering based on the results of similarity calculation above.Do the experiment on the documents data of campus forum,this method reduces the computer processing complexity and improves the veracity of similarity calculation.Experimental result shows that the proposed method can get better document clustering effect and the accuracy of clustering is high

相似文献/References:

[1]倪勇 史怀洲 朱培栋.基于OSPF链路状态数据库构建网络拓扑[J].计算机技术与发展,2009,(03):1.
 NI Yong,Sill Huai-zhou,ZHU Pei-dong.Construct Network Topology Based on OSPF Database[J].,2009,(12):1.
[2]周峰 李龙澍.结合蚁群聚类算法的模糊C均值聚类[J].计算机技术与发展,2012,(07):45.
 ZHOU Feng,LI Long-shu.Fuzzy C Mean Clustering Combined Ant Colony Clustering Algorithm[J].,2012,(12):45.
[3]李雷 崔岩.基于模糊聚类的改进的模糊关联规则挖掘算法[J].计算机技术与发展,2012,(11):18.
 LI Lei,CUI Yan.An Improvement of Fuzzy Association Rules Mining Algorithm Based on Fuzzy Clustering[J].,2012,(12):18.
[4]刘淑英.混合神经模糊分类器的实现[J].计算机技术与发展,2013,(12):113.
 LIU Shu-ying.Implementation of Hybrid Neuro-fuzzy Classifier[J].,2013,(12):113.
[5]标本,梁恺彬,管一弘. 高斯马尔可夫随机场的人脑MR图像分割方法[J].计算机技术与发展,2017,27(07):180.
 BIAO Ben,LIANG Kai-bin,GUAN Yi-hong. An Image Segmentation Method of Brain MR Based on Gaussian Markov Random Field[J].,2017,27(12):180.
[6]周康,万良.基于自编码网络和聚类的入侵检测技术[J].计算机技术与发展,2019,29(05):107.[doi:10. 3969 / j. issn. 1673-629X. 2019. 05. 023]
 ZHOU Kang,WAN Liang.Intrusion Detection Technology Based on Self-coded Networks and Clustering[J].,2019,29(12):107.[doi:10. 3969 / j. issn. 1673-629X. 2019. 05. 023]

备注/Memo

备注/Memo:
胡永丽(1984-),女,内蒙古呼和浩特人,硕士生,研究方向为图像处理、模式识别;龚沛曾,教授,上海市名师,硕士生导师,研究打向为模式识别、智能系统
更新日期/Last Update: 1900-01-01