[1]许鸿奎,周俊杰,姜彤彤,等.基于 BERT 和混合神经网络的诈骗电话文本识别[J].计算机技术与发展,2022,32(11):37-42.[doi:10. 3969 / j. issn. 1673-629X. 2022. 11. 006]
 XU Hong-kui,ZHOU Jun-jie,JIANG Tong-tong,et al.Chinese Telephone Fraud Text Recognition Based on Word Embedding and Hybrid Neural Network[J].,2022,32(11):37-42.[doi:10. 3969 / j. issn. 1673-629X. 2022. 11. 006]
点击复制

基于 BERT 和混合神经网络的诈骗电话文本识别()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
32
期数:
2022年11期
页码:
37-42
栏目:
大数据与云计算
出版日期:
2022-11-10

文章信息/Info

Title:
Chinese Telephone Fraud Text Recognition Based on Word Embedding and Hybrid Neural Network
文章编号:
1673-629X(2022)11-0037-06
作者:
许鸿奎12 周俊杰1 姜彤彤1 卢江坤1 张子枫1 胡文烨1
1. 山东建筑大学 信息与电气工程学院,山东 济南 250101;
2. 山东省智能建筑技术重点实验室,山东 济南 250101
Author(s):
XU Hong-kui12 ZHOU Jun-jie1 JIANG Tong-tong1 LU Jiang-kun1 ZHANG Zi-feng1 HU Wen-ye1
1. School of Information & Electrical Engineering,Shandong Jianzhu University,Jinan 250101,China;
2. Shandong Provincial Key Laboratory of Intelligent Building Technology,Jinan 250101,China
关键词:
电话诈骗词嵌入BERT卷积神经网络双向长短时记忆网络
Keywords:
telephone fraudword embeddingBERTconvolutional neural networkBiLSTM
分类号:
TP391
DOI:
10. 3969 / j. issn. 1673-629X. 2022. 11. 006
摘要:
如今,电话诈骗案件层出不穷,严重危害到了人们的财产安全和社会的和谐安定。 针对社会中的一些诈骗电话问题,提出了一种基于词嵌入和混合神经网络的文本分类方法,实现对诈骗电话文本的分类。 首先构造了诈骗电话文本数据集,内容涵盖了金融、教育、邮递、银行等多类诈骗事件。 为了优化文本的输入词向量,词嵌入部分采用基于 Transformer的 BERT(Bidirectional Encoder Representation from Transformers) 模型来表示诈骗文本,同时采用基于双向长短时记忆网络(Bidirectional Long Short-Term Memory,BiLSTM) 以及多尺度卷积神经网络(Convolutional Neural Network,CNN)的混合神经网络(BiLCNN)对文本的词嵌入表示进行特征提取,充分提取出文本的时序特征和局部相关特征,最后将特征融合在一起通过 Softmax 进行分类。 通过实验比较了 Word2vec、ELMo( Embedding from Language Model) 和 BERT 三种词嵌入模型,表明 BERT 作为输入向量的优越性, 同时在诈骗电话文本数据集上的实验结果表明, 提出的模型 BERT + BiLCNN 相比Word2vec+CNN、ELMo+CNN 和 BERT+CNN 模型,诈骗电话文本分类准确率分别提高了 4. 12% 、2. 84% 和 0. 95% 。
Abstract:
Nowadays,telephone fraud cases emerge in an endless stream, seriously endangering people爷 s property security and socialharmony and stability. For this reason,a telephone text classification method based on word embedding and hybrid neural network isproposed to classify fraudulent phone text. Firstly,a text data set of fraudulent calls is constructed,which covers various fraud incidentssuch as finance,education, mail delivery, banking, etc. The BERT ( Bidirectional Encoder Representation from Transformers) modelbased on Transformer is used for word embedding to represent the fraudulent text. Besides,hybrid neural network ( BiLCNN) based onbidirectional long short-term memory ( BiLSTM) and the multi-scale convolutional neural network is introduced to extract the featuresof the word embedding representation of the text. In this way,the temporal features and local related features of the text can be fully extracted.? The word embedding models of Word2vec, ELMo ( Embedding from Language Model ) and BERT are compared byexperiments,which shows the superiority of BERT as input vector. And the experimental results on the text data set of fraudulent callsshow that the accuracy of the proposed model BERT+BiLCNN is 4. 12% ,2. 84% and 0. 95% higher than that of Word2Vec +CNN、ELMo +CNN and BERT +CNN respectively.

相似文献/References:

[1]孙悦,李晶,吴铁峰,等.基于卷积神经网络的短评语情感分类[J].计算机技术与发展,2018,28(11):61.[doi:10.3969/ j. issn.1673-629X.2018.11.014]
 SUN Yue,LI Jing,WU Tie-feng,et al.Classification of Short Comment Emotion Based on Convolutional Neural Network[J].,2018,28(11):61.[doi:10.3969/ j. issn.1673-629X.2018.11.014]
[2]孟涛,王诚.基于扩展短文本词特征向量的分类研究[J].计算机技术与发展,2019,29(04):57.[doi:10. 3969 / j. issn. 1673-629X. 2019. 04. 012]
 MENG Tao,WANG Cheng.Research on Short Text Classification Based on Extended Word Feature Vectors[J].,2019,29(11):57.[doi:10. 3969 / j. issn. 1673-629X. 2019. 04. 012]
[3]黄 鹤,荆晓远,董西伟,等.基于 Skip-gram 的 CNNs 文本邮件分类模型[J].计算机技术与发展,2019,29(06):143.[doi:10. 3969 / j. issn. 1673-629X. 2019. 06. 030]
 HUANG He,JING Xiao-yuan,DONG Xi-wei,et al.CNNs-Highway Text Message Classification Model Based on Skip-gram[J].,2019,29(11):143.[doi:10. 3969 / j. issn. 1673-629X. 2019. 06. 030]
[4]潘理虎,郝彦杰,周耀辉,等.基于文本卷积的多因素煤炭产品推荐模型[J].计算机技术与发展,2021,31(04):198.[doi:10. 3969 / j. issn. 1673-629X. 2021. 04. 034]
 PAN Li-hu,HAO Yan-jie,ZHOU Yao-hui,et al.Multi Factor Coal Product Recommendation Model Based onText Convolution[J].,2021,31(11):198.[doi:10. 3969 / j. issn. 1673-629X. 2021. 04. 034]
[5]臧玑珣,徐鑫航.基于网络嵌入的农产品销售推荐系统[J].计算机技术与发展,2022,32(10):209.[doi:10. 3969 / j. issn. 1673-629X. 2022. 10. 034]
 ZANG Ji-xun,XU Xin-hang.Recommendation System for Agricultural Products MarketingChannels Based on Network Embedding[J].,2022,32(11):209.[doi:10. 3969 / j. issn. 1673-629X. 2022. 10. 034]
[6]李保珍,顾秀莲.面向未登录词及多义词的共现性词嵌入改进[J].计算机技术与发展,2022,32(12):117.[doi:10. 3969 / j. issn. 1673-629X. 2022. 12. 018]
 LI Bao-zhen,GU Xiu-lian.Co-occurrence Word Embedding Improvement for Unknown andPolysemous Words[J].,2022,32(11):117.[doi:10. 3969 / j. issn. 1673-629X. 2022. 12. 018]
[7]高 贵,赵 阳,于舒娟,等.基于 GNN 的文本分类算法研究[J].计算机技术与发展,2023,33(05):138.[doi:10. 3969 / j. issn. 1673-629X. 2023. 05. 021]
 GAO Gui,ZHAO Yang,YU Shu-juan,et al.Research on Text Classification Algorithm Based on GNN[J].,2023,33(11):138.[doi:10. 3969 / j. issn. 1673-629X. 2023. 05. 021]
[8]杨 彬,高俊涛,王志宝,等.基于词嵌入的元组级数据溯源方法[J].计算机技术与发展,2023,33(12):49.[doi:10. 3969 / j. issn. 1673-629X. 2023. 12. 007]
 YANG Bin,GAO Jun-tao,WANG Zhi-bao,et al.A Tuple-level Data Lineage Approach Based on Word Embedding[J].,2023,33(11):49.[doi:10. 3969 / j. issn. 1673-629X. 2023. 12. 007]

更新日期/Last Update: 2022-11-10