[1]史健婷,吴林皓,张英涛,等.基于 Soft-Masked BERT 的新闻文本纠错研究[J].计算机技术与发展,2022,32(05):202-207.[doi:10. 3969 / j. issn. 1673-629X. 2022. 05. 034]
 SHI Jian-ting,WU Lin-hao,ZHANG Ying-tao,et al.Research on Error Correction of News Text Based onSoft-Masked BERT[J].,2022,32(05):202-207.[doi:10. 3969 / j. issn. 1673-629X. 2022. 05. 034]
点击复制

基于 Soft-Masked BERT 的新闻文本纠错研究()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
32
期数:
2022年05期
页码:
202-207
栏目:
应用前沿与综合
出版日期:
2022-05-10

文章信息/Info

Title:
Research on Error Correction of News Text Based onSoft-Masked BERT
文章编号:
1673-629X(2022)05-0202-06
作者:
史健婷1 吴林皓1 张英涛2 常 亮1
1. 黑龙江科技大学,黑龙江 哈尔滨 150022;
2. 哈尔滨工业大学,黑龙江 哈尔滨 150000
Author(s):
SHI Jian-ting1 WU Lin-hao1 ZHANG Ying-tao2 CHANG Liang1
1. Heilongjiang University of Science and Technology,Harbin 150022,China;
2. Harbin Institute of Technology,Harbin 150000,China
关键词:
新闻稿件计算机辅助技术深度学习中文文本纠错Soft-Masked BERT
Keywords:
news releasecomputer-aided technologydeep learningChinese text error correctionSoft-Masked BERT
分类号:
TP391
DOI:
10. 3969 / j. issn. 1673-629X. 2022. 05. 034
摘要:
互联网时代的新闻宣传领域,每天都会产生海量的文本稿件,仅依靠人工进行校正,成本极高,效率低下。 利用计算机辅助技术对新闻稿件进行审阅极 大地提高了校稿效率,大大减少人力成本,进一步利用特定新闻领域语料集的深度学习模型,完成个性化定制,在该领域的纠错过程中可以取得更好的效果。 文中使用一种全新的中文文本纠错模型理论:Soft-Masked BERT,该模型将中文文本的检错过程与纠错过程分离,纠正网络的输入来自于检测网络输出。 文中旨在 Soft -Masked BERT 基础上进行改进并应用。 使用“ 哈尔滨工业大学新闻网” 新闻稿件中 10 000 条文本序列( HIT News Site) 作为初始语料进行训练,之后对该新闻网的相关稿件进行中文文本校对。 结果表明,Soft-Masked 模型在 HIT News Site 数据集上的整体性能表现优于 BERT-Finetune,准确率提高 0. 6 个百分点,精确率提高 1. 3 个百分点,召回率提高 1. 5 个百分点,F1 分数提高 1. 4 个百分点,效果良好。
Abstract:
In the field of news and propaganda with the Internet era, a large number of text manuscripts are produced every day.Proofreading the first draft? is a huge amount of work. It is extremely costly and inefficient to rely on manual correction. Therefore,it is of great practical significance to find? ? ? a new method to automatically correct the first draft of news. The efficiency of proofreading can be greatly improved with the computer-aided technology, greatly reducing the labor cost. Further use of deep learning model of corpus set in specific news field to complete personalized customization can achieve better results in the process of error correction in this field. A new Chinese text error correction model theory, Soft - Masked BERT, is used in the paper, which separates the error detection process of Chinese text from the error correction process,and the input? of the correction network comes from the output of the detection network.We aim to improve the application on the basis of Soft-Masked BERT, using 10 000 text sequences ( HIT News Site) in the news articles of " Harbin Institute of Technology News Network" as the initial corpus? for training, so as to carry out Chinese texts of related articles of the news network Proofreading. By comparison,the overall performance of the Soft - Masked model on the HIT News Site data set is better than that of BERT-Fine tune,with an accuracy increase of 0. 6% ,an accuracy increase of? 1. 3% ,a recall rate of 1. 5% ,and F1 score of 1. 4% .
更新日期/Last Update: 2022-05-10