[1]高嘉琦,赵庆聪.基于新词发现的古典文学作品分词方法研究[J].计算机技术与发展,2021,31(09):178-181.[doi:10. 3969 / j. issn. 1673-629X. 2021. 09. 030]
 GAO Jia-qi,ZHAO Qing-cong.Study on Word Segmentation Method of Classical Literature Based on New Word Discovery[J].,2021,31(09):178-181.[doi:10. 3969 / j. issn. 1673-629X. 2021. 09. 030]
点击复制

基于新词发现的古典文学作品分词方法研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
31
期数:
2021年09期
页码:
178-181
栏目:
应用前沿与综合
出版日期:
2021-09-10

文章信息/Info

Title:
Study on Word Segmentation Method of Classical Literature Based on New Word Discovery
文章编号:
1673-629X(2021)09-0178-04
作者:
高嘉琦1 赵庆聪12
1. 北京信息科技大学 信息管理学院,北京 100192;
2. 绿色发展大数据决策北京市重点实验室,北京 100192
Author(s):
GAO Jia-qi1 ZHAO Qing-cong12
1. School of Information Management,Beijing Information Science and Technology University,Beijing 100192, China;
2. Beijing Key Laboratory of Big Data Decision for Green Development,Beijing 100192,China
关键词:
典文学新词发现分词互信息左右熵
Keywords:
classical literaturenew word discoveryword segmentationmutual informationleft-right entropy
分类号:
TP301
DOI:
10. 3969 / j. issn. 1673-629X. 2021. 09. 030
摘要:
对于中文文本的分词研究来说,现有的分词方法和技术较多都是针对现代汉语,现代汉语的分词方法和体系已经很成熟,但对古代汉语的研究较少。 由于古文的特殊性,将现代汉语的分词方法技术直接用于古汉语时,无法得到分词准确的理想效果,目前对古汉语分词方法的研究还未形成成熟的体系。 文中提出一种基于新词发现的古典文学作品分词方法,即从大量古典文学作品语料中发现新词,构建古汉语分词词典,在此基础上再对古文文本进行分词。 以《 三国演义》 古文文本处理为例,验证了基于新词发现的古典文学作品分词方法能有效提高古文分词的准确率。
Abstract:
For the research on word segmentation of Chinese text,most of the existing word segmentation methods? ? and technologies are aimed at modern Chinese. The word segmentation methods and systems of modern Chinese have been quite mature,but there are few studies on ancient Chinese. Due to the particularity of ancient Chinese, when the modern Chinese word segmentation method and technology are directly used in ancient Chinese,the ideal effect of accurate word segmentation cannot be obtained. At present,the word segmentation of ancient Chinese has not yet formed a general method and mature system. We propose a method of word segmentation in classical literature based on neologism discovery, that is, discovering new words from a large number of classical literary works,constructing an ancient Chinese word segmentation dictionary,and then segmenting the ancient text on this basis. Taking the ancient text processing of "The Romance of the Three Kingdoms" as an example,it is verified that the word segmentation method of classical literary works based on the discovery of new words can effectively improve the accuracy of ancient text segmentation.
更新日期/Last Update: 2021-09-10