[1]吴红梅,牛耘. 基于词性加权和单词相似性的蛋白质交互识别[J].计算机技术与发展,2015,25(12):6-9.
 WU Hong-mei,NIU Yun. Protein-protein Interaction Identification Based on POS Weighted and Word Similarity[J].,2015,25(12):6-9.
点击复制

 基于词性加权和单词相似性的蛋白质交互识别()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
25
期数:
2015年12期
页码:
6-9
栏目:
智能、算法、系统工程
出版日期:
2015-12-10

文章信息/Info

Title:
 Protein-protein Interaction Identification Based on POS Weighted and Word Similarity
文章编号:
1673-629X(2015)12-0006-04
作者:
 吴红梅牛耘
 南京航空航天大学 计算机科学与技术学院
Author(s):
 WU Hong-mei NIU Yun
关键词:
 大规模语料库蛋白质交互词性加权单词相似性
Keywords:
 large-scale corpusprotein-protein interactionPOS weightdword similarity
分类号:
TP391
文献标志码:
A
摘要:
 与现有绝大多数以单个句子为依据的蛋白质自动识别方式不同,文中基于大规模语料库提出了引入句法和单词相似性这两个因素的蛋白质交互自动识别方法. 首先,采用基于特征的方法对蛋白质对签名档进行分类. 然后,使用分词工具对蛋白质对签名档进行词性标注,将不同词性的特征词语进行分组,并对每种词性进行加权. 最后,基于大规模语料库的方法计算得到单词相似性,根据单词在正、负类中频率的差别调整单词相似性矩阵. 实验结果表明,引入词性加权和单词相似性两个因素后,最终的分类结果较基准模型的识别精度有了明显的提升.
Abstract:
 Be different from the existing vast majority of Protein-Protein Identification ( PPI) based on a sentence,in this paper,put for-ward a new PPI identification method that introduces syntax and word similarity based on large-scale corpus. First of all,feature-based method is used to classify the protein signature. Then,a segmentation tool is used to Part-Of-Speech ( POS) tag protein signatures,so that,feature words based on different POS are grouped and different weights are assigned to each POS of words. Finally,word similarity is calculated through the method based on large-scale corpus and the word similarity matrix is adjusted by the difference in the frequen-cies between positive class and negative class. The experimental results show that once the weighted POS and word similarity are intro-duced,the final classification accuracy is obviously improved than the benchmark model.

相似文献/References:

[1]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(12):1.
[2]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(12):5.
[3]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(12):13.
[4]侯善江[],张代远[][][]. 基于样条权函数神经网络P2P流量识别方法[J].计算机技术与发展,2014,24(07):21.
 HOU Shan-jiang[],ZHANG Dai-yuan[][][]. P2P Traffic Identification Based on Spline Weight Function Neural Network[J].,2014,24(12):21.
[5]李璨,耿国华,李康,等. 一种基于三维模型的文物碎片线图生成方法[J].计算机技术与发展,2014,24(07):25.
 LI Can,GENG Guo-hua,LI Kang,et al. A Method of Obtaining Cultural Debris’ s Line Chart Based on Three-dimensional Model[J].,2014,24(12):25.
[6]翁鹤,皮德常. 混沌RBF神经网络异常检测算法[J].计算机技术与发展,2014,24(07):29.
 WENG He,PI De-chang. Chaotic RBF Neural Network Anomaly Detection Algorithm[J].,2014,24(12):29.
[7]刘茜[],荆晓远[],李文倩[],等. 基于流形学习的正交稀疏保留投影[J].计算机技术与发展,2014,24(07):34.
 LIU Qian[],JING Xiao-yuan[,LI Wen-qian[],et al. Orthogonal Sparsity Preserving Projections Based on Manifold Learning[J].,2014,24(12):34.
[8]尚福华,李想,巩淼. 基于模糊框架-产生式知识表示及推理研究[J].计算机技术与发展,2014,24(07):38.
 SHANG Fu-hua,LI Xiang,GONG Miao. Research on Knowledge Representation and Inference Based on Fuzzy Framework-production[J].,2014,24(12):38.
[9]叶偲,李良福,肖樟树. 一种去除运动目标重影的图像镶嵌方法研究[J].计算机技术与发展,2014,24(07):43.
 YE Si,LI Liang-fu,XIAO Zhang-shu. Research of an Image Mosaic Method for Removing Ghost of Moving Targets[J].,2014,24(12):43.
[10]余松平[][],蔡志平[],吴建进[],等. GSM-R信令监测选择录音系统设计与实现[J].计算机技术与发展,2014,24(07):47.
 YU Song-ping[][],CAI Zhi-ping[] WU Jian-jin[],GU Feng-zhi[]. Design and Implementation of an Optional Voice Recording System Based on GSM-R Signaling Monitoring[J].,2014,24(12):47.

更新日期/Last Update: 2016-01-26