[1]关 慧,宗福焱,曲 盼.基于 BTM 和长文本语义增强的用户评论分类 …[J].计算机技术与发展,2023,33(07):181-187.[doi:10. 3969 / j. issn. 1673-629X. 2023. 07. 027]
 GUAN Hui,ZONG Fu-yan,QU Pan.User Comment Classification Based on BTM and Long Text Semantic Enhancement[J].,2023,33(07):181-187.[doi:10. 3969 / j. issn. 1673-629X. 2023. 07. 027]
点击复制

基于 BTM 和长文本语义增强的用户评论分类 …()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
33
期数:
2023年07期
页码:
181-187
栏目:
人工智能
出版日期:
2023-07-10

文章信息/Info

Title:
User Comment Classification Based on BTM and Long Text Semantic Enhancement
文章编号:
1673-629X(2023)07-0181-07
作者:
关 慧12 宗福焱1 曲 盼1
1. 沈阳化工大学 计算机科学与技术学院,辽宁 沈阳 110142;
2. 辽宁省化工过程工业智能化技术重点实验室,辽宁 沈阳 110142
Author(s):
GUAN Hui12 ZONG Fu-yan1 QU Pan1
1. Department of Computer Science and Technology,Shenyang University of Chemical Technology,Shenyang 110142, China;
2. Liaoning Key Laboratory of Industrial Intelligence Technology on Chemical Process,Shenyang 110142,China
关键词:
词向量主题模型用户评论短文本扩展长文本支持向量机
Keywords:
word vectortopic modeluser commentsshort text extensionlong textsupport vector machine
分类号:
TP391
DOI:
10. 3969 / j. issn. 1673-629X. 2023. 07. 027
摘要:
用户评论分类是挖掘用户评论中的有用信息,为企业和用户提供有效信息的直接手段,但是用户评论类短文本具有特征稀疏、表达形式不规范、反馈的信息量少等特点,这使得传统分类算法对短文本分类的效果不佳。 该文提出了一种融合词向量和 BTM 主题模型,并以长文本辅助的短文本分类方法。 首先,选取特定的长文本,利用 LDA 主题模型得到长文本的文档-主题分布,选取最大概率主题进一步挖掘该主题下的主题-词项分布,选取概率最大的前 n 个词项作为短文本的扩充词项并基于匹配规则对用户评论进行长文本语义增强;然后,将扩充后的短文本进行特征扩展,使用 Word2vec 和LSTM 对用户评论进行训练得到词向量的编码特征;同时对用户评论短文本进行基于吉布斯采样的 BTM 主题模型构建,得到短文本的主题的概率特征;
将词向量的编码特征与主题概率特征融合得到扩展后的文本特征,最后利用 SVM( 支持向量机)方法进行文本分类。 对比其他分类方法,该分类方法在准确率、召回率、F-measure 上表现均有提高。
Abstract:
User review classification is a direct method to mine useful information in user reviews and provide effective information for enterprises and users. However,short texts of user reviews have the characteristics of sparse features,irregular expressions,and small amountof feedback information. The traditional classification algorithm does not work well for?
short text classification. We propose a short textclassification method that fuses word vectors and BTM topic models, assisted by long texts. Firstly,we select a specific long text,use theLDA topic model to obtain the document-topic distribution of the long text,select the topic with the highest probability to further minethe topic-term distribution under the topic,and select the top n words with the highest probability as the extension of the short text termand based on matching rules to enhance the long text semantics of user comments. Then the expanded short texts are expanded in feature,and Word2vec and LSTM are used to train the user comments to obtain the encoding features of word vectors. At the same time,theBTM topic model based on Gibbs sampling is constructed to get the probability characteristics of the topic of the short text. The encodedfeature of the word vector and the topic probability feature are fused to obtain the expanded text feature,and finally the SVM ( SupportVector Machine) method is used for text classification. Compared with other classification methods,the proposed classification methodhas improved in accuracy,recall and F-measure.

相似文献/References:

[1]孙昌年,郑诚,夏青松.基于 LDA 的中文文本相似度计算[J].计算机技术与发展,2013,(01):217.
 SUN Chang-nian,ZHENG Cheng,XIA Qing-song.Chinese Text Similarity Computing Based on LDA[J].,2013,(07):217.
[2]李昌亚,刘方方.基于LDA 的社科文献主题建模方法[J].计算机技术与发展,2018,28(02):182.[doi:10.3969/j.issn.1673-629X.2018.02.039]
 LI Changya,LIU Fangfang. A Topic Modeling Method for Social Science Literature Based on LDA[J].,2018,28(07):182.[doi:10.3969/j.issn.1673-629X.2018.02.039]
[3]彭昀磊,牛耘.基于词向量的特征词选择[J].计算机技术与发展,2018,28(06):7.[doi:10.3969/ j. issn.1673-629X.2018.06.002]
 PENG Yun-lei,NIU Yun.Feature Words Selection Based on Word Embedding[J].,2018,28(07):7.[doi:10.3969/ j. issn.1673-629X.2018.06.002]
[4]李菲菲,王移芝.基于频繁词网络的 LDA 最优主题个数选取方法[J].计算机技术与发展,2018,28(08):1.[doi:10.3969/ j. issn.1673-629X.2018.08.001]
 LI Fei-fei,WANG Yi-zhi.Selection Method of LDA Optimal Topic Number Based on Frequent Word Network[J].,2018,28(07):1.[doi:10.3969/ j. issn.1673-629X.2018.08.001]
[5]白振凯,黄孝喜,王荣波,等. 基于主题模型的汉语动词隐喻识别[J].计算机技术与发展,2016,26(11):67.
 BAI Zhen-kai,HUANG Xiao-xi,WANG Rong-bo,et al. Chinese Verb Metaphor Recognition Based on Topic Model[J].,2016,26(07):67.
[6]张翠肖,郝杰辉,刘星宇,等.基于 CNN-BiLSTM 的中文微博立场分析研究[J].计算机技术与发展,2020,30(07):154.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 033]
 ZHANG Cui-xiao,HAO Jie-hui,LIU Xing-yu,et al.Research on Stance Detection in Chinise Micro-blog Based on CNN-BiLSTM[J].,2020,30(07):154.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 033]
[7]秦牧轩,荆晓远,吴 飞.基于公共空间嵌入的端到端深度零样本学习[J].计算机技术与发展,2018,28(11):44.[doi:10.3969/ j. issn.1673-629X.2018.11.010]
 QIN Mu-xuan,JING Xiao-yuan,WU Fei.End-to-end Deep Zero-shot Learning Based on Co-space Embedding[J].,2018,28(07):44.[doi:10.3969/ j. issn.1673-629X.2018.11.010]
[8]毛宇薇,牛耘.基于关键词的蛋白质交互关系识别[J].计算机技术与发展,2019,29(03):18.[doi:10.3969/ j. issn.1673-629X.2019.03.004]
 MAO Yu-wei,NIU Yun.Protein-protein Interaction Identification Based on Keywords[J].,2019,29(07):18.[doi:10.3969/ j. issn.1673-629X.2019.03.004]
[9]李东欣,禹 龙,田生伟,等.注意力机制的 LSTM-DBN 维语人称代词指代消解[J].计算机技术与发展,2019,29(07):33.[doi:10. 3969 / j. issn. 1673-629X. 2019. 07. 007]
 LI Dong-xin,YU Long,TIAN Sheng-wei,et al.Attention Mechanism of LSTM-DBN Uyghur Personal Pronoun Anaphora Resolution[J].,2019,29(07):33.[doi:10. 3969 / j. issn. 1673-629X. 2019. 07. 007]
[10]孙本旺,田 芳.基于深度学习算法的藏文微博情感计算研究[J].计算机技术与发展,2019,29(10):55.[doi:10. 3969 / j. issn. 1673-629X. 2019. 10. 012]
 SUN Ben-wang,TIAN Fang.Research on Tibetan Micro-blog Affective Computation Based on Deep Learning Algorithm[J].,2019,29(07):55.[doi:10. 3969 / j. issn. 1673-629X. 2019. 10. 012]
[11]陈 莹,叶 宁,徐 康,等.基于领域特征指示词的隐式特征识别研究[J].计算机技术与发展,2021,31(09):24.[doi:10. 3969 / j. issn. 1673-629X. 2021. 09. 005]
 CHEN Ying,YE Ning,XU Kang,et al.Research on Implicit Feature Identification Based on Domain Feature Indicators[J].,2021,31(07):24.[doi:10. 3969 / j. issn. 1673-629X. 2021. 09. 005]

更新日期/Last Update: 2023-07-10