[1]王文霞,王春红. 短信文本分类技术的研究[J].计算机技术与发展,2016,26(05):145-148.
 WANG Wen-xia,WANG Chun-hong. Research on Text Classification Technology for Message[J].,2016,26(05):145-148.
点击复制

 短信文本分类技术的研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
26
期数:
2016年05期
页码:
145-148
栏目:
应用开发研究
出版日期:
2016-05-10

文章信息/Info

Title:
 Research on Text Classification Technology for Message
文章编号:
1673-629X(2016)05-0145-04
作者:
 王文霞王春红
 运城学院 计算机科学与技术系
Author(s):
 WANG Wen-xiaWANG Chun-hong
关键词:
 短信文本分类向量空间模型模糊聚类模糊K均值
Keywords:
 text categorizationvector space modelfuzzy clusteringfuzzy K-means
分类号:
TP301
文献标志码:
A
摘要:
 短信作为一种重要的交流手段,发挥着越来越重要的作用.但伴随着短信的广泛使用,垃圾短信则严重影响着人们的生活,因此文中基于短信文本特征词对短信进行分类研究.其中,TF-IDF特征词权重计算方法是对文本词汇权重计算的一种经典算法,得到了广泛应用.但此方法为了简化计算,忽略了词语之间的相互关系.针对此问题,依据同一短信文本中的词汇之间存在的相互关系,文中对权重计算法进行了调整,提出了基于模糊K均值的短信文本分类算法.即先将短信文本集用TF-IDF算法处理,得到词汇-文本集,再用模糊K均值算法对得到的词汇-文本集进行处理.最后通过实验,验证了基于模糊K均值的短信文本分类算法,其分类结果的查全率和查准率都较高,有效辨别了垃圾短信.
Abstract:
 As an important means of communication,SMS plays an increasingly important role. But along with the extensive use of SMS, SMS spam seriously influences people’ s lives. Therefore,the classification of SMS is researched based on the keywords in this paper. TF-IDF weight calculation method is a classical algorithm to calculate the text word weight,which is widely used. But in order to calculate simply,this method ignores the mutual relations between words. Aiming at this problem,based on the same relationship between words in the text messages,in this paper,the weighting method is used for adjusting,it puts forward the text classification based on fuzzy K-means algorithm. The text set is processed by TF-IDF algorithm,getting a vocabulary-text set. Then fuzzy K-means algorithm is used to get a vocabulary-text set. Finally,through the experiment to verify the text classification based on fuzzy K-means algorithm,the classification results of recall and precision is high.

相似文献/References:

[1]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(05):1.
[2]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(05):5.
[3]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(05):13.
[4]侯善江[],张代远[][][]. 基于样条权函数神经网络P2P流量识别方法[J].计算机技术与发展,2014,24(07):21.
 HOU Shan-jiang[],ZHANG Dai-yuan[][][]. P2P Traffic Identification Based on Spline Weight Function Neural Network[J].,2014,24(05):21.
[5]李璨,耿国华,李康,等. 一种基于三维模型的文物碎片线图生成方法[J].计算机技术与发展,2014,24(07):25.
 LI Can,GENG Guo-hua,LI Kang,et al. A Method of Obtaining Cultural Debris’ s Line Chart Based on Three-dimensional Model[J].,2014,24(05):25.
[6]翁鹤,皮德常. 混沌RBF神经网络异常检测算法[J].计算机技术与发展,2014,24(07):29.
 WENG He,PI De-chang. Chaotic RBF Neural Network Anomaly Detection Algorithm[J].,2014,24(05):29.
[7]刘茜[],荆晓远[],李文倩[],等. 基于流形学习的正交稀疏保留投影[J].计算机技术与发展,2014,24(07):34.
 LIU Qian[],JING Xiao-yuan[,LI Wen-qian[],et al. Orthogonal Sparsity Preserving Projections Based on Manifold Learning[J].,2014,24(05):34.
[8]尚福华,李想,巩淼. 基于模糊框架-产生式知识表示及推理研究[J].计算机技术与发展,2014,24(07):38.
 SHANG Fu-hua,LI Xiang,GONG Miao. Research on Knowledge Representation and Inference Based on Fuzzy Framework-production[J].,2014,24(05):38.
[9]叶偲,李良福,肖樟树. 一种去除运动目标重影的图像镶嵌方法研究[J].计算机技术与发展,2014,24(07):43.
 YE Si,LI Liang-fu,XIAO Zhang-shu. Research of an Image Mosaic Method for Removing Ghost of Moving Targets[J].,2014,24(05):43.
[10]余松平[][],蔡志平[],吴建进[],等. GSM-R信令监测选择录音系统设计与实现[J].计算机技术与发展,2014,24(07):47.
 YU Song-ping[][],CAI Zhi-ping[] WU Jian-jin[],GU Feng-zhi[]. Design and Implementation of an Optional Voice Recording System Based on GSM-R Signaling Monitoring[J].,2014,24(05):47.

更新日期/Last Update: 2016-09-19