[1]杨东,毋涛,赵雪青,等.基于改进TF-IDF融合二进制灰狼优化的短文本分类[J].计算机技术与发展,2024,34(08):37-41.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0122]
 YANG Dong,WU Tao,ZHAO Xue-qing,et al.Short Text Classification Based on Improved TF-IDF Integrated with Binary Grey Wolf Optimization[J].,2024,34(08):37-41.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0122]
点击复制

基于改进TF-IDF融合二进制灰狼优化的短文本分类

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
34
期数:
2024年08期
页码:
37-41
栏目:
媒体计算
出版日期:
2024-08-10

文章信息/Info

Title:
Short Text Classification Based on Improved TF-IDF Integrated with Binary Grey Wolf Optimization
文章编号:
1673-629X(2024)08-0037-05
作者:
杨东毋涛赵雪青李猛
西安工程大学 计算机科学学院,陕西 西安 710048
Author(s):
YANG DongWU TaoZHAO Xue-qingLI Meng
School of Computer Science,Xi'an Polytechnic University,Xi'an 710048,China
关键词:
短文本分类特征加权TF-IDF-RANK方法特征选择二进制灰狼优化
Keywords:
short text classificationfeature weightingTF-IDF-RANK methodfeature selectionbinary gray wolf optimization
分类号:
TP391.1
DOI:
10.20165/j.cnki.ISSN1673-629X.2024.0122
摘要:
为了提高特殊类型短文本分类准确度和降低特征维度,提出了基于改进 TF-IDF 方法融合二进制灰狼优化的短文 本分类。 为了提高特征向量文本权重计算准确度,提出了点赞排列因子,并融合了文本特征集中度,对附有点赞数的特殊类型文本进行权重计算,设计改进了 TF-IDF-RANK 方法对特征进行加权;同时,基于初选特征向量,设计优化了二进制灰狼优化算法(BGWO)搜寻最优特征子集,引入衰减系数向量和多优解迭代机制,提高灰狼搜寻性能。 结果表明,该方法有效地提升了权重准确率,更好地表征初选特征向量,增强特征选择时寻找全局最优解的能力,进而提高短文本的分类效果。 通过 LABIC 和抖音开放平台数据集测试,综合指标 F1 值分别提高了14. 76% 和14. 02% ,验证了该方法对于特殊类型文本分类的有效性。
Abstract:
To improve the classification accuracy and decrease the feature dimension of special type short text,short text classification based on improved TF-IDF method integrated with Binary Gray Wolf Optimization (BGWO) is proposed. To improve the accuracy of feature vector text weight calculation,likes ranking factor is proposed,and text feature concentration is integrated to calculate the weight of special types of text with a number of likes,and the improved TF-IDF-RANK is designed to weight the features. Meanwhile,based on the initial selection of feature vectors,the BGWO algorithm is designed and optimized to search for the optimal feature subset,and the attenuation coefficient vector and multi-optimal solution iteration mechanism are introduced to improve the performance of gray wolf search. The results show that the proposed method effectively improves the weighting accuracy,better characterizes the primitive feature vectors,enhances the ability to find the global optimal solution during feature selection,and thus improves the classification effect of short text. Tested by LABIC and Tiktok open platform dataset,the F1 value of the comprehensive index is improved by 14. 76% and 14. 02% respectively,which verifies the effectiveness of the proposed method for the classification of special types of text.

相似文献/References:

[1]朱映辉 江玉珍 欧阳春娟.基于特征加权的自然纹理FCM聚类分割算法[J].计算机技术与发展,2010,(01):104.
 ZHU Ying-hui,JIANG Yu-zhen,OUYANG Chun-juan.Natural Texture Segmentation Algorithm Based on Weighted Features FCM Clustering[J].,2010,(08):104.
[2]李玲娟 李冰.一种基于特征加权的蚁群聚类新算法[J].计算机技术与发展,2010,(08):67.
 LI Ling-juan,LI Bing.A New Ant Colony Clustering Algorithm Based on Feature Weight[J].,2010,(08):67.
[3]吴红梅,牛耘. 基于特征加权的蛋白质交互识别[J].计算机技术与发展,2016,26(02):114.
 WU Hong-mei,NIU Yun. Identification of Protein-protein Interaction Based on Feature Weighted[J].,2016,26(08):114.
[4]王小楠,黄卫东.基于类别主题词集的加权相似度短文本分类[J].计算机技术与发展,2022,32(09):95.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 015]
 WANG Xiao-nan,HUANG Wei-dong.Short Text Classification with Weighted Similarity Based on Category Topic Word Set[J].,2022,32(08):95.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 015]
[5]陈海青,蔡江辉,杨海峰,等.基于特征加权与自动交互的点击率预测模型[J].计算机技术与发展,2023,33(11):196.[doi:10. 3969 / j. issn. 1673-629X. 2023. 11. 029]
 CHEN Hai-qing,CAI Jiang-hui,YANG Hai-feng,et al.Click-through Rate Prediction Model Based on Feature Weighting and Automatic Interaction[J].,2023,33(08):196.[doi:10. 3969 / j. issn. 1673-629X. 2023. 11. 029]

更新日期/Last Update: 2024-08-10