藏文情感词典构建的现状分析-《计算机技术与发展》

文章信息/Info

Title:: Status Analysis of Construction of Tibetan Emotional Dictionary

文章编号:: 1673-629X(2024)03-0009-06

作者:: 才让东知¹; 2 ; 杨杰¹; 2 ; 尼玛扎西¹; 2; 3; 1. 藏文信息技术教育部工程研究中心,西藏拉萨 850000;
2. 西藏大学信息科学技术学院,西藏拉萨 850000;
3. 西藏信息化省部共建协同创新中心,西藏拉萨 850000

Author(s):: TSHERING Dondrub1; 2 ; YANG Jie1; 2 ; NYIMA Trashi1; 2; 3; 1. Engineering Research Center of Tibetan Information Technology of Ministry of Education,Lhasa 850000,China;
2. School of Information Science and Technology,Tibet University,Lhasa 850000,China;
3. Collaborative Innovational Center for Tibet informatization,Lhasa 850000,China

关键词:: 藏文情感词典; 情感词分类; 词典构建方法; 词汇量; 词汇构成

Keywords:: Tibetan emotional lexicon; emotional word classification; lexicon construction method; vocabulary; vocabulary composition

分类号:: TP391. 1

DOI:: 10. 3969 / j. issn. 1673-629X. 2024. 03. 002

摘要:: 近年来,许多研究者证实,基于深度学习的多特征融合情感分析方法比纯深度学习方法更能挖掘文本的情感信息,其中情感词特征是最重要的特征之一。目前,藏文虽然有少量的情感词典,但基本上没有公开,想要使用藏文情感词典资源,只能自行构建。研究藏文情感词典的构建现状能对后续藏文情感词典的构建提供帮助。为了解藏文情感词典的词汇分类方法、常用词典构建方法以及已有藏文情感词典的词汇量与词汇构成等方面的研究现状,文中通过对比和统计等方法分析了近 10 年藏文情感词典构建相关的文献( 以 CHKI 为主) ,总结出了藏文情感词典构建方面的研究状况。经研究发现,情感词的分类方法中,主要有 7 大类 21 小类、12 大类 20 小类、2 大类 18 小类等。藏文情感词典的构建方法包括词典匹配、机器翻译、SO-PMI 扩充、基于 word2vec 或 BERT 的相似度扩充方法等。已有藏文情感词典的词汇量大致在5 000至 28 000 之间,接近中文情感词典的水平,词汇构成主要包含情感词、程度副词、否定词、双重否定词、表情词等。希望为相关研究人员提供参考。

Abstract:: In recent years,many researchers have confirmed that deep learning based multi feature fusion sentiment analysis methods aremore capable of mining emotional information in texts than pure deep learning methods,with emotional word features being one of themost important features. At present,although there are a small number of emotional lexicon?
in Tibetan,they are basically not publicly available. If you want to use Tibetan emotional lexicon resources,you can only build them yourself. Studying the current construction　status of Tibetan emotion lexicon can provide assistance for the subsequent construction of Tibetan emotion lexicon. In order tounderstand the vocabulary classification methods,　commonly used lexicon construction methods,and the current research status of the vocabulary and composition of existing Tibetan emotional lexicon,we analyze the literature related to the construction of Tibetan emotionallexicon in the past 10 years ( mainly CHKI) through comparative and statistical methods, and summarize the research status of theconstruction of Tibetan emotional lexicon. Through research,it has been found that the classification methods for emotional words mainlyinclude 7 categories and 21 subcategories, 12 categories and 20 subcategories, 2 categories and 18 subcategories. The constructionmethods of Tibetan emotional lexicon include lexicon matching,machine translation,SO-PMI expansion,similarity expansion based onword2vec or BERT,etc. The vocabulary of existing Tibetan emotional lexicon is roughly distributed between 5 000 and 28 000,close tothe level of Chinese emotional lexicon. The vocabulary composition mainly includes emotional words,degree adverbs,negative words,double negative words,　emoticons, etc. We hope to provide reference for researchers and those who are building Tibetan emotional lexicon.

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

文章信息/Info

常用功能

导航/Navigate

工具/Tools

统计/Statistics