[1]高 贵,赵 阳,于舒娟,等.基于 GNN 的文本分类算法研究[J].计算机技术与发展,2023,33(05):138-144.[doi:10. 3969 / j. issn. 1673-629X. 2023. 05. 021]
 GAO Gui,ZHAO Yang,YU Shu-juan,et al.Research on Text Classification Algorithm Based on GNN[J].,2023,33(05):138-144.[doi:10. 3969 / j. issn. 1673-629X. 2023. 05. 021]
点击复制

基于 GNN 的文本分类算法研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
33
期数:
2023年05期
页码:
138-144
栏目:
人工智能
出版日期:
2023-05-10

文章信息/Info

Title:
Research on Text Classification Algorithm Based on GNN
文章编号:
1673-629X(2023)05-0138-07
作者:
高 贵赵 阳于舒娟姚成杰黄丽亚
南京邮电大学 电子与光学工程学院,江苏 南京 210046
Author(s):
GAO GuiZHAO YangYU Shu-juanYAO Cheng-jieHUANG Li-ya
School of Electronic and Optical Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210046,China
关键词:
图神经网络文本分类数据增强词嵌入注意力机制
Keywords:
graph neural networktext classificationdata enhancementword embeddingattention mechanism
分类号:
TP391.1
DOI:
10. 3969 / j. issn. 1673-629X. 2023. 05. 021
摘要:
图神经网络( Graph Neural Networks,GNN) 因其结构的新颖性在文本分类任务中广受关注。 针对 GNN 在训练数据集较少时容易出现过拟合、 特征信息不足等问题, 提出了 Att - DASA - ReGNN ( Regional Embedding GNN based on DataAugmentation and Self-Attention with the Attention Mechanisms)模型。 该模型在数据特征提取阶段引入了简单数据增强方法(Easy Data Augmentation,EDA) 和 Self-Attention 技术改善了过拟合问题;原模型词嵌入方式对维度很高且稀疏的高阶邻域信息的捕捉能力不足,该模型中通过增加区域词嵌入技术,加强了词级之间的关系,
使得模型更容易捕捉高阶邻域信息,从而减轻数据稀疏带来的影响。 为了进一步提升模型的文本分类准确率,该模型的图词特征交互阶段通过引入 Soft -Attention 技术改进了注意力权重提取方式。 最后,在多种数据集上的实验证明,该模型的分类准确率较之前模型均有不同程度的提升。
Abstract:
Graph Neural Networks ( GNN) has gained wide attention in text classification tasks because of its novelty of structure.Aiming at the problems that GNN is prone to overfitting and insufficient feature information when the training dataset is small,we proposethe Att-DASA-ReGNN model. The model applied EDA and Self-Attention techniques to improve the overfitting in the data feature extraction stage. Aiming at the problem that the word embedding method of the original model is insufficient to capture high-order neighborhood information with high dimensions and sparseness, the regional word embedding technology is introduced in the model. Thetechnique further strengthens the relationship between word levels,making it easier for models to capture high-order neighborhood information,thereby mitigating the impact of data sparsity. In order to further improve the text classification accuracy of the model, theinteractive stage of the graph-word features of the model improves the attention weight extraction method by introducing Soft-Attentiontechnology. Finally,simulation experiments on various datasets show that the classification accuracy of the model is improved to varyingdegrees compared with the previous model.

相似文献/References:

[1]田昕辉 李成基.带有短语切分的中文文本分类方法[J].计算机技术与发展,2010,(01):5.
 TIAN Xin-hui,LEE Sung-kee.Phrase Segmentation for Chinese Text Classification[J].,2010,(05):5.
[2]姜鹤 陈丽亚.SVM文本分类中一种新的特征提取方法[J].计算机技术与发展,2010,(03):17.
 JIANG He,CHEN Li-ya.A New Feature Selection Method in SVM Text Categorization[J].,2010,(05):17.
[3]周瑛 张铃.有限混合模型在文本分类中的应用研究[J].计算机技术与发展,2010,(06):18.
 ZHOU Ying,ZHANG Ling.Study of Application of Finite Mixture Model in Text Classification[J].,2010,(05):18.
[4]许幸 张启蕊.基于KNN算法的医药信息文本分类系统的研究[J].计算机技术与发展,2009,(04):206.
 XU Xing,ZHANG Qi-rui.Research of Medical Information Text Categorization Based on KNN Algorithm[J].,2009,(05):206.
[5]陈锦禾 范新 沈闻 沈洁.基于情感词识别的BBS情感分类研究[J].计算机技术与发展,2009,(07):120.
 CHEN Jin-he,FAN Xin,SHEN Wen,et al.Research on Sentiment Classification of BBS Reviews Based on Identifying Words with Polarity[J].,2009,(05):120.
[6]刘锋 唐佳 仲红.一种基于RBF神经网络的XML文本分类方法[J].计算机技术与发展,2009,(08):34.
 LIU Feng,TANG Jia,ZHONG Hong.A Text Categorization Method Based on RBF Neural Network[J].,2009,(05):34.
[7]晋幼丽 周明全 王学松.SVM和K-means结合的文本分类方法研究[J].计算机技术与发展,2009,(11):35.
 JIN You-li,ZHOU Ming-quan,WANG Xue-song.Research on Text Classification Method of SVM and K - means[J].,2009,(05):35.
[8]张燕平 徐庆鹏 苏守宝 邢猛.一种基于贪婪覆盖的文本分类方法[J].计算机技术与发展,2009,(01):74.
 ZHANG Yan-ping,XU Qing-peng,SU Shou-bao,et al.A Text Categorization Method Based on Greedy Cover[J].,2009,(05):74.
[9]陈素萍 谢丽聪.一种文本特征选择方法的研究[J].计算机技术与发展,2009,(02):112.
 CHEN Su-ping,XIE Li-cong.Research on Document Feature Selection[J].,2009,(05):112.
[10]于水英 丁华福 付志超.基于遗传算法和模糊聚类的文本分类研究[J].计算机技术与发展,2009,(04):131.
 YU Shui-ying,DING Hua-fu,FU Zhi-chao.Study on Text Categorization Based on Genetic Algorithm and Fuzzy Clustering[J].,2009,(05):131.

更新日期/Last Update: 2023-05-10