[1]王小林,陆骆勇,邰伟鹏. 基于信息熵的新的词语相似度算法研究[J].计算机技术与发展,2015,25(09):119-122.
 WANG Xiao-lin,LU Luo-yong,TAI Wei-peng. Research of a New Algorithm of Words Similarity Based on Information Entropy[J].,2015,25(09):119-122.
点击复制

 基于信息熵的新的词语相似度算法研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
25
期数:
2015年09期
页码:
119-122
栏目:
安全与防范
出版日期:
2015-09-10

文章信息/Info

Title:
 Research of a New Algorithm of Words Similarity Based on Information Entropy
文章编号:
1673-629X(2015)09-0119-04
作者:
 王小林陆骆勇邰伟鹏
 安徽工业大学 计算机科学与技术学院
Author(s):
 WANG Xiao-lin LU Luo-yongTAI Wei-peng
关键词:
 词语相似度知网义原信息熵词表相似度
Keywords:
word similarityHowNetsememeinformation entropysimilarity of words surface
分类号:
TP301.6
文献标志码:
A
摘要:
 针对词语相似度计算中结果合理性的问题,文中基于对“知网”中词语、义项和义原三个层次概念的研究,提出一种结合信息论研究中熵的概念的新的词语相似度方法。首先是引入词表相似度计算对词语集进行合理选取,再根据义原信息熵对各义原进行权重上的平衡,抑制一些常见义原在词语的义原集中比重过大而导致计算结果与真实情况相比出现明显误差的情况。实验结果表明,与传统方法相比,文中方法在实验并未出现1.000这样过于绝对的结果,提高了结果的合理性;并且实验词语集而非两词语之间,说明比较的效率也得到了提高。
Abstract:
 The words similarity computation is widely used in the area of natural language processing. In this paper,based on the research of words,concepts and sememe in HowNet,a new algorithm of word similarity based on information entropy is proposed. Firstly,similari-ty of words surface is led in this paper for selecting words from words set reasonably. Secondly,weight of each sememe would be bal-anced on the basis of information entropy to inhibition that common sememe would be much more than others in the sememe set what would result in obvious error comparing with physical truth. Experimental results show that compared with traditional methods,the unrea-sonable result like 1. 000 is no-show,which means that the result is rational. In addition,this experiment is based on words set instead of two words,which means that the method is more efficient.

相似文献/References:

[1]王爽 熊德兰 赵会洋.基于论坛主题的网页褒贬倾向性识别[J].计算机技术与发展,2009,(09):111.
 WANG Shuang,XIONG De-lan,ZHAO Hui-yang.Appraisial Orientation Identification in WebPages Based on Forums Theme[J].,2009,(09):111.
[2]杨金柱 刘金岭.基于词语上下文的文本分类研究[J].计算机技术与发展,2011,(08):145.
 YANG Jin-zhu,LIU Jin-ling.Study of Text Classification Using Context[J].,2011,(09):145.
[3]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(09):1.
[4]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(09):5.
[5]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(09):13.
[6]侯善江[],张代远[][][]. 基于样条权函数神经网络P2P流量识别方法[J].计算机技术与发展,2014,24(07):21.
 HOU Shan-jiang[],ZHANG Dai-yuan[][][]. P2P Traffic Identification Based on Spline Weight Function Neural Network[J].,2014,24(09):21.
[7]李璨,耿国华,李康,等. 一种基于三维模型的文物碎片线图生成方法[J].计算机技术与发展,2014,24(07):25.
 LI Can,GENG Guo-hua,LI Kang,et al. A Method of Obtaining Cultural Debris’ s Line Chart Based on Three-dimensional Model[J].,2014,24(09):25.
[8]翁鹤,皮德常. 混沌RBF神经网络异常检测算法[J].计算机技术与发展,2014,24(07):29.
 WENG He,PI De-chang. Chaotic RBF Neural Network Anomaly Detection Algorithm[J].,2014,24(09):29.
[9]刘茜[],荆晓远[],李文倩[],等. 基于流形学习的正交稀疏保留投影[J].计算机技术与发展,2014,24(07):34.
 LIU Qian[],JING Xiao-yuan[,LI Wen-qian[],et al. Orthogonal Sparsity Preserving Projections Based on Manifold Learning[J].,2014,24(09):34.
[10]尚福华,李想,巩淼. 基于模糊框架-产生式知识表示及推理研究[J].计算机技术与发展,2014,24(07):38.
 SHANG Fu-hua,LI Xiang,GONG Miao. Research on Knowledge Representation and Inference Based on Fuzzy Framework-production[J].,2014,24(09):38.
[11]张培颖[],房龙云[]. 多特征结合的词语相似度计算模型[J].计算机技术与发展,2014,24(12):37.
 ZHANG Pei-ying[],FANG Long-yun[]. Word Similarity Computation Model of Multi-features Combination[J].,2014,24(09):37.
[12]闫红[],李付学[],周云[]. 基于HowNet句子相似度的计算[J].计算机技术与发展,2015,25(11):53.
 YAN Hong[],LI Fu-xue[],ZHOU Yun[]. Calculation of Sentence Similarity Based on HowNet[J].,2015,25(09):53.

更新日期/Last Update: 2015-10-16