[1]库瓦特拜克·马木提.基于机器学习方法的哈萨克语词干切分研究[J].计算机技术与发展,2020,30(04):182-188.[doi:10. 3969 / j. issn. 1673-629X. 2020. 04. 035]
Kuwatebaike·MAMUTI.Research on Kazakh Stemming Based on Machine Learning[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2020,30(04):182-188.[doi:10. 3969 / j. issn. 1673-629X. 2020. 04. 035]
点击复制
基于机器学习方法的哈萨克语词干切分研究(
)
《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]
- 卷:
-
30
- 期数:
-
2020年04期
- 页码:
-
182-188
- 栏目:
-
应用开发研究
- 出版日期:
-
2020-04-10
文章信息/Info
- Title:
-
Research on Kazakh Stemming Based on Machine Learning
- 文章编号:
-
1673-629X(2020)04-0182-07
- 作者:
-
库瓦特拜克·马木提
-
伊犁师范大学 电子与信息工程学院,新疆 伊宁 835000
- Author(s):
-
Kuwatebaike·MAMUTI
-
School of Electronic and Information Engineering,Yili Normal University,Yining 835000,China
-
- 关键词:
-
词干切分; 统计学习模型; 最大熵模型; 条件随机场模型
- Keywords:
-
stemming; statistical learning model; maximum entropy model; conditional random field model
- 分类号:
-
TP391
- DOI:
-
10. 3969 / j. issn. 1673-629X. 2020. 04. 035
- 摘要:
-
自然语言处理任务中词处理是基础性的工作,其结果直接影响后续任务的效果。 词干和构形附加成分是哈萨克 语单词的组成成分,其中词干显示单词的主要意义,而构形附加成分中包含着词法和句法信息,因此词干切分是对哈萨克 语进行有效处理的基础。 文中构建了哈萨克语词干切分语料库,并通过将哈萨克语词干切分看作是序列化标注问题,提 出一种有效的哈萨克语词标注方法,并基于最大熵模型和条件随机场模型构建了对比词干切分实验。 结果表明基于条件 随机场模型的词干切分准确率比现有最好的哈萨克语词干切分系统的准确率有15%的提高。 该方法对哈萨克语词干切分相较于基于规则的方法有了一定的提升。
- Abstract:
-
Word processing is a basic task in natural language processing,which directly affects the subsequent tasks. Stem and inflectional suffix are the main components of Kazakh words. Stem displays the main significance of the word,and the inflectional suffix containslotsofinformation ofgrammarand syntax. Asaresult,stemming becomes the basis of Kazakh information processing. Webuild the Kazakh segmentation corpus,and through theKazakh stemming asserialized label problem,proposean effective Kazakh word labeling method. Based on the maximum entropy model and the conditional random field model,a comparative word-stem segmentation experiment is constructed. It is showed that the stemming accuracy based on conditional random field model is 15% higher than that of the best Kazakh stemming system. Compared with the rule-based method,the proposed method improves the stemming of Kazakh words.
更新日期/Last Update:
2020-04-10