基于近义词扩充的非通用语翻译评估-《计算机技术与发展》

文章信息/Info

Title:: Evaluation of Non-general Language Translation Based onSynonyms Expansion

Author(s):: MA Wen-qian; WANG Li-qing; WANG Juan; CHEN Bao-tong; School of Information Science and Engineering,Yunnan University,Kunming 650091,China

Keywords:: non-general language; translation evaluation; synonym extension; BLEU algorithm; GloVe model

摘要:: BLEU 算法在用于非通用语翻译质量评估计算时,由于使用范围以及语料采集渠道的局限性等原因, 导致难以获得足够规模的高质量非通用语语料作为参考译文, 因而会造成在面对同一语义不同表述时 BLEU 误判低分。为此,该文采用预先对参考译文进行近义词分析提取的方法,扩展生成多参考译文,尽可能覆盖多种翻译的表达。同时设置阈值对提取出的近义词进行筛选, 排除低质量近义词, 避免因为扩展带来参考译文质量下降, 再基于该扩充译文完成非通用语的质量评估。在实验中, 以泰语为例, 采用 GloVe 和 Word2vec? 模型分别对语料库进行训练和近义词提取、参考译文扩充和BLEU 评估。实验结果表明:在面对非通用语等参考语料不足的质量评估时,采用该方法可有效地提高评估的准确度,降低误判率。

Abstract:: When the BLEU algorithm is used in non-general language translation quality evaluation calculations,it is difficult to obtain high-quality non-universal corpus of sufficient scale as a reference translation due to the scope of use and the limitations of corpus collection channels. Therefore, BLEU will misjudge low scores when facing the same semantics with different expressions. To this end,we adopt the method of analyzing and extracting synonyms from the reference translation in advance to expand and generate multiple reference translations,covering as many translation expressions as possible. At the same time,a threshold is set to filter the extracted synonyms and exclude low-quality synonyms,so as to avoid the degradation of the reference translation quality caused by the extension.Then complete the non -general language quality assessment based on the expanded translation. In the experiment, taking Thai as an example, we use the model of GloVe and Word2vec for corpus training, synonym extraction, reference translation expansion and BLEU evaluation. The experiment shows that the proposed method can effectively improve the accuracy of the evaluation and reduce the misjudgment rate when faced with insufficient reference corpus such as non-general language.