[1]冼广铭,王鲁栋,曾碧卿,等.基于 LDA 和 BiGRU 的文本分类[J].计算机技术与发展,2022,32(04):15-20.[doi:10. 3969 / j. issn. 1673-629X. 2022. 04. 003]
 XIAN Guang-ming,WANG Lu-dong,ZENG Bi-qing,et al.Text Classification Based on LDA and BiGRU[J].,2022,32(04):15-20.[doi:10. 3969 / j. issn. 1673-629X. 2022. 04. 003]
点击复制

基于 LDA 和 BiGRU 的文本分类()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
32
期数:
2022年04期
页码:
15-20
栏目:
人工智能
出版日期:
2022-04-10

文章信息/Info

Title:
Text Classification Based on LDA and BiGRU
文章编号:
1673-629X(2022)04-0015-06
作者:
冼广铭王鲁栋曾碧卿梅灏洋陶 睿
华南师范大学 软件学院,广东 佛山 528225
Author(s):
XIAN Guang-mingWANG Lu-dongZENG Bi-qingMEI Hao-yangTAO Rui
School of Software,South China Normal University,Foshan 528225,China
关键词:
LDA 主题模型BiGRUWord2vec深度学习文本分类
Keywords:
LDA topic modelBiGRUWord2vecdeep learningtext classification
分类号:
TP391. 1;TP183
DOI:
10. 3969 / j. issn. 1673-629X. 2022. 04. 003
摘要:
文本分类是自然语言处理的基础任务,文本中的特征稀疏性和提取特征所用的神经网络影响后续的分类效果。针对文本中的特征信息不足以及传统模型上下文依赖关系方面不足的问题,提出经过 TF-IDF 加权的词向量和 LDA 主题模型相融合,利用双向门控循环神经网络层( BiGRU) 充分提取文本深度信息特征的分类方法。 该方法主要使用的数据集是天池比赛新闻文本分类数据集,首先用 Word2vec 和 LDA 模型分别在语料库中训练词向量,Word2vec 经过 TF-IDF 进行加权所得的词向量再与 LDA 训练的经过最大主题概率扩展的词向量进行简单拼接,拼接后得到文本矩阵,将文本矩阵输入到 BiGRU 神经网络中,分别从前后两个反方向提取文本深层次信息的特征向量,最后使用 softmax 函数进行多分类,根据输出的概率判断所属的类别。 与现有的常用文本分类模型相比,准确率、F1 值等评价指标都有了较高的提升。
Abstract:
Text classification is a basic task of natural language processing. The feature sparsity in the text and the neural network used to extract the feature affect the subsequent classification effect. In order to solve the problems of feature sparsity in text and the deficiency of context dependence in traditional models, we propose a new classification method which combines TF-IDF-weighted word vectors with LDA subject model and uses bidirectional gating cyclic neural network layer ( BIGRU) to fully extract the features of depth information in text. The main data set is the news text classification data set of Tianchi Competition. Firstly,word vectors are trained in the corpus by Word2vec and LDA models respectively. Word2vec weighted word vectors by TF-IDF are then simply joined with word vectors trained by LDA with maximum topic probability expansion. The text matrix is obtained after the Mosaic,and the text matrix is input into the Bigru neural network, and the feature vectors of the deep information of the text are extracted from the two opposite directions respectively. Finally, the softmax function is used for multiple classification, and the category is judged according to the out put probability. Compared with the existing common text classification model,the accuracy,F1 value and other evaluation indicators have been improved.

相似文献/References:

[1]程 涛,崔宗敏,喻 静.一种用于视频推荐的基于 LDA 的深度学习模型[J].计算机技术与发展,2020,30(08):86.[doi:10. 3969 / j. issn. 1673-629X. 2020. 08. 014]
 CHENG Tao,CUI Zong-min,YU Jing.A LDA-based Topic Attribute-aware in-Depth Learning Model for Video Recommendation[J].,2020,30(04):86.[doi:10. 3969 / j. issn. 1673-629X. 2020. 08. 014]

更新日期/Last Update: 2022-04-10