[1]许甜华,吴明礼.一种基于 TF-IDF 的朴素贝叶斯算法改进[J].计算机技术与发展,2020,30(02):75-79.[doi:10. 3969 / j. issn. 1673-629X. 2020. 02. 016]
 XU Tian-hua,WU Ming-li.An Improved Naive Bayes Algorithm Based on TF-IDF[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2020,30(02):75-79.[doi:10. 3969 / j. issn. 1673-629X. 2020. 02. 016]
点击复制

一种基于 TF-IDF 的朴素贝叶斯算法改进()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
30
期数:
2020年02期
页码:
75-79
栏目:
智能、算法、系统工程
出版日期:
2020-02-10

文章信息/Info

Title:
An Improved Naive Bayes Algorithm Based on TF-IDF
文章编号:
1673-629X(2020)02-0075-05
作者:
许甜华吴明礼
北方工业大学 信息学院,北京 100144
Author(s):
XU Tian-huaWU Ming-li
School of Informatics,North China University of Technology,Beijing 100144,China
关键词:
朴素贝叶斯TF-IDF算法去中心化位置信息特征权重
Keywords:
naive BayesTF-IDF algorithmdecentralizationlocation informationfeature weight
分类号:
TP391
DOI:
10. 3969 / j. issn. 1673-629X. 2020. 02. 016
摘要:
目前对以朴素贝叶斯算法为代表的文本分类算法,普遍存在特征权重一致,考虑指标单一等问题。 为了解决这个问题,提出了一种基于TF-IDF的朴素贝叶斯改进算法TF-IDF-DL朴素贝叶斯算法。 该算法以TF-IDF为基础,引入去中 心化词频因子和特征词位置因子以加强特征权重的准确性。 为了验证该算法的效果,采用了搜狗实验室的搜狗新闻数据 集进行实验,实验结果表明,在朴素贝叶斯分类算法中引入TF-IDF-DL算法,能够使该算法在进行文本分类中的准确率、 召回率和F1 值都有较好的表现,相比国内同类研究TF-IDF-dist贝叶斯方案,分类准确率提高8.6%,召回率提高11.7%, F1 值提高7.4%。 因此该算法能较好地提高分类性能,并且对不易区分的类别也能在一定程度上达到良好的分类效果。
Abstract:
At present,the text classification algorithm represented by the naive Bayes algorithm generally has the same feature weights and single index. In order to solve this problem,we propose an improved TF-IDF-based naive Bayes algorithm, TF-IDF-DL naive Bayes algorithm. Based on TF-IDF,this algorithm introduces decentralized word frequency factor and feature word position factor to enhance the accuracy offeatureweights. In orderto verify itseffect,we use Sogou’s Sogou news dataset to conduct experiments. The experiment shows that the TF-IDF-DL algorithm is introduced into the naive Bayesian classification algorithm,which can make the algorithm perform well in the accuracy,recall and F1 value in text classification. Compared with the domestic similar research TF-IDF-dist Bayesian scheme,the classification accuracy rate is increased by 8.6%,the recall rate is increased by 11.7%,and the F1 value is increased to 7.4%,so the proposed algorithm can improve the classification performance better and achieve a great classification effect to some extent for the indistinguishable categories.

相似文献/References:

[1]赵敏 倪志伟 刘斌.K—means与朴素贝叶斯在商务智能中的应用[J].计算机技术与发展,2010,(04):179.
 ZHAO Min,NI Zhi-wei,LIU Bin.Application Research of K - Means Clustering and Naive Bayesian Algorithm in Business Intelligence[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2010,(02):179.
[2]胡为成 胡学钢.基于遗传算法的朴素贝叶斯分类[J].计算机技术与发展,2007,(01):30.
 HU Wei-cheng,HU Xue-gang.Naive Bayes Classification Based on Genetic Algorithms[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2007,(02):30.
[3]王峻 周孟然.一种基于MDL度量的选择性扩展贝叶斯分类器[J].计算机技术与发展,2007,(07):35.
 WANG Jun,ZHOU Meng-ran.A Selective Augmented Naive Bayesian Classifier Based on MDL Score[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2007,(02):35.
[4]翟素兰 郑诚.用于入侵检测的基于粗糙集的贝叶斯分类器[J].计算机技术与发展,2006,(01):226.
 ZHAI Su-lan,ZHENG Cheng.Bayes Classifier Based on Rough Set Used in Intrusion Detection[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2006,(02):226.
[5]王峻.一种基于强属性限定的贝叶斯分类模型[J].计算机技术与发展,2007,(02):205.
 WANG Jun.A Restricted Bayesian Classification Model Based on Strong Attributes[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2007,(02):205.
[6]梁天超[][],荆晓远[],姚永芳[],等. 基于加权RFE-Bayes方法的软件缺陷预测模型[J].计算机技术与发展,2015,25(10):131.
 LIANG Tian-chao[][],JING Xiao-yuan[],YAO Yong-fang[],et al. A Prediction Model for Software Defect Based on Weighted RFE-Bayes[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2015,25(02):131.
[7]刘宝芹,牛耘. 多层次中文微博情绪分析[J].计算机技术与发展,2015,25(11):23.
 LIU Bao-qin,NIU Yun. Multi-hierarchy Emotion Analysis of Chinese Microblog[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2015,25(02):23.
[8]李志强,潘苏含,戴 娟,等.一种改进的 TextRank 关键词提取算法[J].计算机技术与发展,2020,30(03):77.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 015]
 LI Zhi-qiang,PAN Su-han,DAI Juan,et al.An Improved TextRank Keyword Extraction Algorithm[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2020,30(02):77.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 015]
[9]谢小军[],陈光喜[]. 基于多属性联合的朴素贝叶斯分类算法[J].计算机技术与发展,2016,26(12):77.
 XIE Xiao-jun[],CHEN Guang-xi[]. Naive Bayes Classification Algorithm Based on United Multi-attribute[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2016,26(02):77.

更新日期/Last Update: 2020-02-10