[1]才让东知,杨 杰,尼玛扎西.藏文情感词典构建的现状分析[J].计算机技术与发展,2024,34(03):9-14.[doi:10. 3969 / j. issn. 1673-629X. 2024. 03. 002]
 TSHERING Dondrub,YANG Jie,NYIMA Trashi.Status Analysis of Construction of Tibetan Emotional Dictionary[J].,2024,34(03):9-14.[doi:10. 3969 / j. issn. 1673-629X. 2024. 03. 002]
点击复制

藏文情感词典构建的现状分析()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
34
期数:
2024年03期
页码:
9-14
栏目:
综述
出版日期:
2024-03-10

文章信息/Info

Title:
Status Analysis of Construction of Tibetan Emotional Dictionary
文章编号:
1673-629X(2024)03-0009-06
作者:
才让东知12 杨 杰12 尼玛扎西123
1. 藏文信息技术教育部工程研究中心,西藏 拉萨 850000;
2. 西藏大学 信息科学技术学院,西藏 拉萨 850000;
3. 西藏信息化省部共建协同创新中心,西藏 拉萨 850000
Author(s):
TSHERING Dondrub12 YANG Jie12 NYIMA Trashi123
1. Engineering Research Center of Tibetan Information Technology of Ministry of Education,Lhasa 850000,China;
2. School of Information Science and Technology,Tibet University,Lhasa 850000,China;
3. Collaborative Innovational Center for Tibet informatization,Lhasa 850000,China
关键词:
藏文情感词典情感词分类词典构建方法词汇量词汇构成
Keywords:
Tibetan emotional lexiconemotional word classificationlexicon construction methodvocabularyvocabulary composition
分类号:
TP391. 1
DOI:
10. 3969 / j. issn. 1673-629X. 2024. 03. 002
摘要:
近年来,许多研究者证实,基于深度学习的多特征融合情感分析方法比纯深度学习方法更能挖掘文本的情感信息,其中情感词特征是最重要的特征之一。 目前,藏文虽然有少量的情感词典,但基本上没有公开,想要使用藏文情感词典资源,只能自行构建。 研究藏文情感词典的构建现状能对后续藏文情感词典的构建提供帮助。 为了解藏文情感词典的词汇分类方法、常用词典构建方法以及已有藏文情感词典的词汇量与词汇构成等方面的研究现状,文中通过对比和统计等方法分析了近 10 年藏文情感词典构建相关的文献( 以 CHKI 为主) ,总结出了藏文情感词典构建方面的研究状况。 经研究发现,情感词的分类方法中,主要有 7 大类 21 小类、12 大类 20 小类、2 大类 18 小类等。 藏文情感词典的构建方法包括词典匹配、机器翻译、SO-PMI 扩充、基于 word2vec 或 BERT 的相似度扩充方法等。 已有藏文情感词典的词汇量大致在5 000至 28 000 之间,接近中文情感词典的水平,词汇构成主要包含情感词、程度副词、否定词、双重否定词、表情词等。 希望为相关研究人员提供参考。
Abstract:
In recent years,many researchers have confirmed that deep learning based multi feature fusion sentiment analysis methods aremore capable of mining emotional information in texts than pure deep learning methods,with emotional word features being one of themost important features. At present,although there are a small number of emotional lexicon?
in Tibetan,they are basically not publicly available. If you want to use Tibetan emotional lexicon resources,you can only build them yourself. Studying the current construction status of Tibetan emotion lexicon can provide assistance for the subsequent construction of Tibetan emotion lexicon. In order tounderstand the vocabulary classification methods, commonly used lexicon construction methods,and the current research status of the vocabulary and composition of existing Tibetan emotional lexicon,we analyze the literature related to the construction of Tibetan emotionallexicon in the past 10 years ( mainly CHKI) through comparative and statistical methods, and summarize the research status of theconstruction of Tibetan emotional lexicon. Through research,it has been found that the classification methods for emotional words mainlyinclude 7 categories and 21 subcategories, 12 categories and 20 subcategories, 2 categories and 18 subcategories. The constructionmethods of Tibetan emotional lexicon include lexicon matching,machine translation,SO-PMI expansion,similarity expansion based onword2vec or BERT,etc. The vocabulary of existing Tibetan emotional lexicon is roughly distributed between 5 000 and 28 000,close tothe level of Chinese emotional lexicon. The vocabulary composition mainly includes emotional words,degree adverbs,negative words,double negative words, emoticons, etc. We hope to provide reference for researchers and those who are building Tibetan emotional lexicon.
更新日期/Last Update: 2024-03-10