«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

[1]宗中.中文信息检索中词典机制分词算法的研究[J].计算机技术与发展,2014,24(04):118-121.
　ZONG Zhong.Study of Segmentation Algorithm of Dictionary Mechanism Orienting Chinese Information Retrieval[J].,2014,24(04):118-121.
点击复制

中文信息检索中词典机制分词算法的研究()

分享到：

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 24
期数:: 2014年04期

页码:: 118-121

栏目:: 智能、算法、系统工程

出版日期:: 2014-04-30

文章信息/Info

Title:: Study of Segmentation Algorithm of Dictionary Mechanism Orienting Chinese Information Retrieval

文章编号:: 1673-629X（2014）04-0118-04

作者:: 宗中; 江苏省邮电规划设计院有限公司

Author(s):: ZONG Zhong

关键词:: 信息检索; 中文分词; 数据结构; 哈希

Keywords:: information retrieval; Chinese word segmentation; data structures; hash

分类号:: TP301.6

文献标志码:: A

摘要:: 中文自动分词是实现搜索引擎信息检索的基础，分词词典是汉语自动分词系统的一个重要组成部分，词典的加载和查询速度直接影响到分词系统的速度。文中在研究传统词典机制的基础上，分析了基于双字哈希词典机制对词条除首次字外剩余词的不足，给出了一种改进的双字哈希的词典机制。最后，文中对改进算法从准确率、分全率和分词速度等方面进行了测试，结果表明，改进后的分词算法在不提升已有典型词典机制维护复杂度的情况下，提高了词条匹配的查询速度和效率。

Abstract:: Chinese automatic segmentation is the base of the information retrieval search engine. Word dictionary is an important part of Chinese word segmentation system. The loading and querying efficiency is a key impact fact of the word segmentation system. Based on the study of the traditional dictionary mechanism,analyze the weak point of the double word hash dictionary,and propose a modified double hash dictionary. At last test the method from the accurate,full-rate,word speed,etc. With the result of the test,this improved hash mechanism enhances the entry speed and efficiency of matching queries,without completing the maintenance complexity of the traditional dictionary.

相似文献/References:

[1]汪小珍李龙澍.基于模糊集的信息检索方法[J].计算机技术与发展,2010,(02):37.
　WANG Xiao-zhen,LI Long-shu.An Information Retrieval Scheme Based on Fuzzy Set[J].,2010,(04):37.
[2]盛启东谭守标徐超冯二媛陈军宁.巧用黑盒法逆推百度中文分词算法[J].计算机技术与发展,2010,(04):136.
　SHENG Qi-dong,TAN Shou-biao,XU Chao,et al.Inferring Baidu＇s Chinese Word Segmentation Algorithm by Supposing a Black Box[J].,2010,(04):136.
[3]张赢万仲保.对专业搜索引擎中未登录词的识别研究[J].计算机技术与发展,2009,(05):134.
　ZHANG Ying,WAN Zhong-bao.Professional Search Engine Unknown Word of Recognition[J].,2009,(04):134.
[4]牟帅黄映辉李冠宇.语义Web服务的OWL—S描述及其应用[J].计算机技术与发展,2009,(01):13.
　MU Shuai,HUANG Ying-hui,LI Guan-yu.OWL - S Description of Semantic Web Service and Its Applications[J].,2009,(04):13.
[5]赵俊杰胡学钢.一种基于段落词频统计的论文抄袭判定算法[J].计算机技术与发展,2009,(04):231.
　ZHAO Jun-jie,HU Xue-gang.A Way to Judge Plagiarism in Academic Papers Based on Word - Frequency Statistics of Paragraphs[J].,2009,(04):231.
[6]罗桂琼费洪晓戴弋.基于反序词典的中文分词技术研究[J].计算机技术与发展,2008,(01):80.
　LUO Gui-qiong,FEI Hong-xiao,DAI Yi.Research of Chinese Segmentation Based on Converse Segmentation Dictionary[J].,2008,(04):80.
[7]杜光芹张化祥赵瑞东.主题Web挖掘研究[J].计算机技术与发展,2008,(02):94.
　DU Guang-qin,ZHANG Hua-xiang,ZHAO Rui-dong.State of Topic Web Mining[J].,2008,(04):94.
[8]钟锋罗燕京杨曦李虎.一种基于合并策略的机构名称切分方法[J].计算机技术与发展,2008,(05):12.
　ZHONG Feng,LUO Yan-jing,YANG Xi,et al.An Organization Name Segmentation Approach Based on Combination Strategy[J].,2008,(04):12.
[9]李桂华汪学明.语义信息检索框架设计及其算法研究[J].计算机技术与发展,2010,(08):41.
　LI Gui-hua,WANG Xue-ming.Research of Framework and Algorithm of Semantic Information Retrieval[J].,2010,(04):41.
[10]周瑛张铃.模糊集方法在检索评价系统中的应用[J].计算机技术与发展,2007,(01):111.
　ZHOU Ying,ZHANG Ling.Application of Fuzzy Measure in Information Retrieval Evaluation[J].,2007,(04):111.

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed722
全文下载/Downloads505
评论/Comments