[1]高国强,黄吕威,陈丰钰. 使用网络搜索引擎计算汉语词汇的语义相似度[J].计算机技术与发展,2014,24(07):84-87.
 GAO Guo-qiang,HUANG Lü-wei CHEN Feng-yu. Calculation of Chinese Words Semantic Similarity Using Network Search Engines[J].,2014,24(07):84-87.
点击复制

 使用网络搜索引擎计算汉语词汇的语义相似度()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
24
期数:
2014年07期
页码:
84-87
栏目:
智能、算法、系统工程
出版日期:
2014-07-10

文章信息/Info

Title:
 Calculation of Chinese Words Semantic Similarity Using Network Search Engines
文章编号:
1673-629X(2014)07-0084-04
作者:
 高国强黄吕威陈丰钰
 武汉纺织大学 传媒学
Author(s):
 GAO Guo-qiangHUANG Lü-wei CHEN Feng-yu
关键词:
 相似度搜索引擎词典
Keywords:
 similaritysearch engineslexicon
分类号:
TP301.6
文献标志码:
A
摘要:
 汉字词语的语义相似度计算是中文信息处理中的一个关键问题。文中利用网络搜索引擎提供的信息来计算汉语词对的语义相似性。首先通过程序访问搜索引擎,获取汉字词汇的搜索结果数,并依此实现了相似度计算模型WebPMI;然后描述了根据查询返回的文本片段进行语义相关性分析的模型CODC;最后,结合这个两个模型,给出了文中算法的伪代码。实验结果显示,文中的算法较好地利用了互联网信息,实现了一种较新的汉语词汇语义相似度计算方法,接近于利用词典提供的信息计算相似度的传统算法。
Abstract:
 Similarity computation of Chinese words is a key problem in Chinese information processing. It measures semantic similarity between Chinese words using the information returned by web search engines. First,implement a model named WebPMI which computes similarity using page counts,and then,describe another model named CODC which analyzes semantic similarity using text snippets. Final-ly,present the algorithm based on the two models. Experimental results show that this algorithm outperforms all the existing web-based semantic similarity measures for Chinese,and is close to the traditional semantic similarity measures using lexicon.

相似文献/References:

[1]曹道友 程家兴.基于改进的选择算子和交叉算子的遗传算法[J].计算机技术与发展,2010,(02):44.
 CAO Dao-you,CHENG Jia-xing.A Genetic Algorithm Based on Modified Selection Operator and Crossover Operator[J].,2010,(07):44.
[2]王春雪 王继成 郑吉.谱聚类在图像检索中的应用[J].计算机技术与发展,2009,(01):207.
 WANG Chun-xue,WANG Ji-cheng,ZHENG Ji.Application of Spectral Clustering in Image Retrieval[J].,2009,(07):207.
[3]林智超 朱国进.一种基于FCA的概念相似度算法[J].计算机技术与发展,2008,(09):112.
 LIN Zhi-ehao,ZHU Guo-jin.A Concept Similarity Algorithm Based on FCA[J].,2008,(07):112.
[4]乌庆敏 杨思春.基于潜在语义分析的智能答疑系统研究与实现[J].计算机技术与发展,2008,(09):251.
 WU Qing-min,YANG Si-chun.Research on Intelligent Question Answering System Based on Latent Semantic Analysis[J].,2008,(07):251.
[5]盛步云 万哲 丁毓峰.BPMS中一种基于流程异常库的异常处理方法[J].计算机技术与发展,2008,(12):84.
 SHENG Bu-yun,WAN Zhe,DING Yu-feng.A Method Based on Process Abnormity Set for Resolving Abnormity in BPMS[J].,2008,(07):84.
[6]程舒通.Web点击流的频繁模式聚类算法[J].计算机技术与发展,2007,(09):18.
 CHENG Shu-tong.Clustering Algorithm of Web Click Flow Frequency Pattern[J].,2007,(07):18.
[7]闫蓉 张蕾.一种新的汉语词义消歧方法[J].计算机技术与发展,2006,(03):22.
 YAN Rong,ZHANG Lei.New Chinese Word Sense Disambiguation Method[J].,2006,(07):22.
[8]李泽军 曾利军 刘文华.基于相关性和语义相似度融合的查询扩展方法[J].计算机技术与发展,2010,(09):66.
 LI Ze-jun,ZENG Li-jun,LIU Wen-hua.Query Expansion Method Based on Relativity and Similarity Inosculate[J].,2010,(07):66.
[9]李文娟 胡春生.基于聚类优化覆盖的集成学习方法[J].计算机技术与发展,2010,(11):51.
 LI Wen-juan,HU Chun-sheng.A Combined Learning Algorithm of Optimum Covering Based on Clustering[J].,2010,(07):51.
[10]贾晓倩 刘方爱.基于最近邻搜索算法分组式P2P网络拓扑模型[J].计算机技术与发展,2010,(11):100.
 JIA Xiao-qian,LIU Fang-ai.A Topology Model Based on Nearest Neighbor for P2P Group Networks[J].,2010,(07):100.
[11]卢传耀,徐敏. 基于加权的冗余相似本体实例发现的研究[J].计算机技术与发展,2014,24(09):11.
 LU Chuan-yao,XU Min. Research on Ontology Instances Found Redundancy Based on Entropy Weighted[J].,2014,24(07):11.
[12]李振博,徐桂琼,査九. 基于用户谱聚类的协同过滤推荐算法[J].计算机技术与发展,2014,24(09):59.
 LI Zhen-bo,XU Gui-qiong,ZHA Jiu. A Collaborative Filtering Recommendation Algorithm Based on User Spectral Clustering[J].,2014,24(07):59.
[13]蒋宗礼,隋少鹏. 基于领域本体和位置关系的信息检索模型[J].计算机技术与发展,2015,25(01):6.
 JIANG Zong-li,SUI Shao-peng. Information Retrieval Model Based on Domain Ontology and Position Relationship[J].,2015,25(07):6.
[14]王全民,王莉,曹建奇. 基于评论挖掘的改进的协同过滤推荐算法[J].计算机技术与发展,2015,25(10):24.
 WANG Quan-min,WANG Li,CAO Jian-qi. Improved Collaborative Filtering Recommendation Algorithm Based on Comments Mining[J].,2015,25(07):24.
[15]戴琼,周明全,付倩. 小篆文字的自动识别[J].计算机技术与发展,2016,26(03):1.
 DAI Qiong,ZHOU Ming-quan,FU Qian. Automatic Recognition of Xiaozhuan Fonts[J].,2016,26(07):1.
[16]赵艳妮[][],郭华磊[],马军生[]. 基于路径权重的XML文档相似度仿真研究[J].计算机技术与发展,2016,26(09):197.
 ZHAO Yan-ni[][],GUO Hua-lei[],MA Jun-sheng[]. Simulation Research of XML Document Similarity Based on Path Weighting[J].,2016,26(07):197.
[17]白菊[],何聚厚[]. 应用于问答系统的Lucene相似度检索算法改进[J].计算机技术与发展,2017,27(11):79.
 BAI Ju[],HE Ju-hou[]. Improvement of Lucene Similarity Search Algorithm Applied in Question Answering System[J].,2017,27(07):79.

更新日期/Last Update: 2015-03-13