计算机算法类资料的中英文智能翻译-《计算机技术与发展》

文章信息/Info

Title:: English-Chinese Intelligent Translation of Computer Algorithm Corpus

文章编号:: 1673-629X(2021)07-0176-06

作者:: 陈家乐; 张艳玲; 广州大学计算机科学与网络工程学院,广东广州 510006

Author(s):: CHEN Jia-le; ZHANG Yan-ling; Faculty of Computer Science and Network Engineering,Guangzhou University,Guangzhou 510006,China

关键词:: 机器翻译; Word2Vec 算法; 词向量; 文本相似度; GNMT

Keywords:: machine translation; Word2Vec algorithm; word vector; text similarity; GNMT

分类号:: TP18

DOI:: 10. 3969 / j. issn. 1673-629X. 2021. 07. 029

摘要:: 当前互联网免费可用的在线翻译系统均是使用通用语料训练出来的神经机器翻译模型,在通用语义环境下翻译出色, 而在特定的垂直领域(如计算机专业领域)中, 由于训练文本和模型训练算法缺乏针对性,导致翻译结果出现专业词汇错漏,文本晦涩难懂。因此,实现特定垂直领域的自动化机器翻译的需求越来越大。通过网络爬虫获取计算机算法类相关的英汉双语例句,基于 Word2Vec 算法生成含有上下文信息的词向量,将词向量嵌入到 Google 开源 GNMT 模型训练英汉翻译模型,基于训练模型实现简易翻译软件。通过对照实验,探究 Word2Vec 算法中词向量长度对计算词汇间文本相似度的影响和对 GNMT 训练效果的影响,以及 GNMT 超参数中的隐藏层单元数 num_unit、批尺寸 batch_size 对训练效果的影响。综合实验结果训练最佳的英汉翻译模型。

Abstract:: At present,the free and available online translation systems on the Internet are all neural machine translation models trained by general corpus,which are excellent in the general semantic environment. However, in the specific vertical field ( such as computer professional field),due to lack of pertinence of training text and model training algorithm, the translation results appear professional vocabulary errors and omissions, and the text is obscure. Therefore,the demand to achieve an automated machine translation in a specific field becomes bigger and bigger. The English-Chinese bilingual example sentences related to the computer algorithm are obtained by web crawler, and the word vector with context information based on Word2Vec algorithm is generated and embedded into Google open-source GNMT model to train English - Chinese translation model. On the basis, a simple translation software is implemented. Through acomparative experiment,we explore the influence of word vector length on the calculation of text similarity between words and the training effect of GNMT in Word2Vec algorithm,as well as the influence of the number of hidden layer units and batch size in GNMTsuper parameters on the training effect,training the best English-Chinese translation model based on the experimental results.

相似文献/References:

[1]王立霞.面向汉英机器翻译的专利文献小句变换研究[J].计算机技术与发展,2012,(11):77.
　WANG Li-xia.A Chinese-English MT-Oriented Study on Small Sentence Pattern Transformation of Long Patent Sentence[J].,2012,(07):77.
[2]李梦洁,董峦.基于 PyTorch 的机器翻译算法的实现[J].计算机技术与发展,2018,28(10):160.[doi:10.3969/ j. issn.1673-629X.2018.10.033]
　LI Meng-jie,DONG Luan.Implementation of Machine Translation Algorithm Based on PyTorch[J].,2018,28(07):160.[doi:10.3969/ j. issn.1673-629X.2018.10.033]

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

文章信息/Info

相似文献/References:

常用功能

导航/Navigate

工具/Tools

统计/Statistics