[1]赵 伟,邓叶勋,赵建强*,等.基于强化语义的中文广告文本识别技术研究[J].计算机技术与发展,2021,31(03):65-69.[doi:10. 3969 / j. issn. 1673-629X. 2021. 03. 011]
ZHAO Wei,DENG Ye-xun,ZHAO Jian-qiang*,et al.Research on Chinese Advertisement Text Recognition Based on Enhanced Semantic[J].,2021,31(03):65-69.[doi:10. 3969 / j. issn. 1673-629X. 2021. 03. 011]
点击复制
基于强化语义的中文广告文本识别技术研究(
)
《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]
- 卷:
-
31
- 期数:
-
2021年03期
- 页码:
-
65-69
- 栏目:
-
大数据分析与挖掘
- 出版日期:
-
2021-03-10
文章信息/Info
- Title:
-
Research on Chinese Advertisement Text Recognition Based on Enhanced Semantic
- 文章编号:
-
1673-629X(2021)03-0065-05
- 作者:
-
赵 伟1; 邓叶勋2; 赵建强2; 3*; 李文瑞1; 韩 冰1; 欧荣安1
-
1. 广州市刑事科学技术研究所,广东 广州 510030;
2. 厦门市美亚柏科信息股份有限公司,福建 厦门 361008;
3. 西安电子科技大学,陕西 西安 710071
- Author(s):
-
ZHAO Wei1; DENG Ye-xun2; ZHAO Jian-qiang2; 3*; LI Wen-rui1; HAN Bing1; OU Rong-an1
-
1. Guangzhou Institute of Criminal Science and Technology,Guangzhou 510030,China;
2. Xiamen Meiya Pico Information Co. ,Ltd. ,Xiamen 361008,China;
3. Xidian University,Xi’an 710071,China
-
- 关键词:
-
广告文本分类; 语义强化; 特征融合; 预训练; 注意力机制
- Keywords:
-
advertising text classification; semantic enhanced; feature fusion; pre-training; attention mechanism
- 分类号:
-
TP391.1
- DOI:
-
10. 3969 / j. issn. 1673-629X. 2021. 03. 011
- 摘要:
-
互联网是广告推广的重要媒介,但是低质、诈骗、违法等违规广告也大量充斥其中,严重污染网络空间,因此,实现恶意广告的有效甄别对构建安全清朗的网络环境意义重大。 针对各类违法违规中文广告内容的识别需求, 利用 Bert(bidirectional encoder representation from transformers) 和 Word2vec 分别提取文本字粒度和词粒度嵌入特征, 使用 CNN (convolutional neural networks)网络对 Bert 高层特征做深层抽取,同时将词粒度特征向量输入到双向 LSTM( long shortterm memory)网络提取全局语义,并采用 Attention 机制对语义特征强化,将强化特征和 Bert 字粒度特征进行融合, 充分利用动态词向量和静态词向量的语义表征优势, 提出一种基于强化语义的中文广告识别模型 CARES(Chinese advertisement text recognition based on enhanced semantic)。 在真实的社交聊天文本数据集上的实验表明,与使用卷积神经网络、循环神经网络等文本分类模型相比, CARES 模型分类性能最优,能更加精确识别社交聊天文本中的广告内容,模型识别的正确率达到 97.73% 。
- Abstract:
-
The Internet is an important medium for advertising promotion. Low-quality, fraud, illegal advertisements? ? ? are full of the Internet,which pollute cyberspace seriously. Therefore, the realization of effective screening? ? ? of malicious advertising is of great significance to construct a safe and clean network environment. We use Bert (bidirectional encoder representation from transformers) and Word2vec to extract char and word level embedding features respectively,and use CNN (revolutionary neural networks) to extract the high-level features of Bert,input the word features vector into the long short term memory (LSTM) network to extract? ? the global semantics,and use the attention mechanism to strengthen the semantic features,integrate the enhanced features and Bert word features,which make full use of the semantic representation advantages? ? of dynamic and static word vectors. We propose a Chinese advertising recognition model CARES (Chinese advertisement text recognition based on enhanced semantic). Compared with other text classification models such as convolutional neural network and recurrent neural network,CARES has the best classification performance and can recognize the advertising content in social chat text more accurately,the accuracy of advertising text recognition reaches 97. 73% .
更新日期/Last Update:
2020-03-10