[1]蔡松成,牛耘.基于词频统计的蛋白质交互关系识别[J].计算机技术与发展,2019,29(02):65-68.[doi:10.3969/j.issn.1673-629X.2019.02.013]
 CAI Songcheng,NIU Yun.Protein-protein Interaction Identification Based on Word Frequency Count[J].,2019,29(02):65-68.[doi:10.3969/j.issn.1673-629X.2019.02.013]
点击复制

基于词频统计的蛋白质交互关系识别()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
29
期数:
2019年02期
页码:
65-68
栏目:
智能、算法、系统工程
出版日期:
2019-02-10

文章信息/Info

Title:
Protein-protein Interaction Identification Based on Word Frequency Count
文章编号:
1673-629X(2019)02-0065-04
作者:
蔡松成牛耘
南京航空航天大学 计算机科学与技术学院,江苏 南京 211106
Author(s):
CAI Song-chengNIU Yun
School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
关键词:
远监督蛋白质交互最大期望算法词频统计
Keywords:
distant supervisionprotein-protein interactionexpectation maximization algorithmword frequency count
分类号:
TP391
DOI:
10.3969/j.issn.1673-629X.2019.02.013
摘要:
目前,基于远监督的蛋白质交互关系抽取方法通过将知识库中的实体对与文本中的实体进行匹配来产生大规模的训练数据,有效地解决了标注数据不足的问题。在基于最大期望算法的蛋白质交互识别的基础上,提出了一种基于词频统计的蛋白质交互关系识别。该方法对每一个蛋白质对签名档进行处理,取出两个目标蛋白质中间的单词;然后对其进行词性标注,只保留名词和动词,同时进行词干提取;最终得到每个蛋白质对签名档下的词频统计。利用得到的词频信息设定阈值来获取签名档的高频词,改进最大期望算法的初始化过程。实验结果表明,通过加入高频词信息的干预来进一步获取句子的类别作为初始值较原始的基于最大期望算法的模型,取得了更高且均衡的精确度和召回率,对目前基于远监督的蛋白质交互关系识别方法进行了明显的改进
Abstract:
Current protein-protein interaction (PPI) extraction approach based on distant supervision gathers large scales of training data by aligning entity pairs in knowledge base with entities in text,which solves the problem of lack of annotation data effectively. In this paper,based on the protein interaction recognition using the expectation maximization algorithm,we propose a novel method of word fre-quency count,which processes the signature of each protein pair and obtains the unigram words between two target proteins. Then,the da-ta which is obtained by the first step should be processed with POS tagging and stem extraction,only the nouns and verbs saved. Finally,we can obtain the word frequency statistics for signatures of protein pairs. High frequency words are produced by setting the threshold forthe word frequency statistics,which can be used to improve the initialization step of the expectation maximization algorithm. The experi-ment shows that the high and well balanced precision and recall are achieved by further integrating the high-frequency word informationto obtain the sentence category as the initial model based on the maximum expectation algorithm,which shows significant improvement incomparison to current PPI based on distant supervision.

相似文献/References:

[1]王宇伟,牛耘. 基于关系相似性的蛋白质交互作用识别[J].计算机技术与发展,2015,25(02):42.
 WANG Yu-wei,NIU Yun. Identification of Protein-protein Interaction Based on Relational Similarity[J].,2015,25(02):42.
[2]彭昀磊,牛 耘.基于弱监督的蛋白质交互识别[J].计算机技术与发展,2018,28(02):19.[doi:10.3969/j.issn.1673-629X.2018.02.005]
 PENG Yunlei,NIU Yun.Protein-protein Interaction Identification Based on Weak Supervision[J].,2018,28(02):19.[doi:10.3969/j.issn.1673-629X.2018.02.005]
[3]吴红梅,牛耘. 基于词性加权和单词相似性的蛋白质交互识别[J].计算机技术与发展,2015,25(12):6.
 WU Hong-mei,NIU Yun. Protein-protein Interaction Identification Based on POS Weighted and Word Similarity[J].,2015,25(02):6.
[4]吴红梅,牛耘. 基于特征加权的蛋白质交互识别[J].计算机技术与发展,2016,26(02):114.
 WU Hong-mei,NIU Yun. Identification of Protein-protein Interaction Based on Feature Weighted[J].,2016,26(02):114.
[5]彭昀磊,牛耘.基于词向量的特征词选择[J].计算机技术与发展,2018,28(06):7.[doi:10.3969/ j. issn.1673-629X.2018.06.002]
 PENG Yun-lei,NIU Yun.Feature Words Selection Based on Word Embedding[J].,2018,28(02):7.[doi:10.3969/ j. issn.1673-629X.2018.06.002]
[6]张景,吴红梅,牛耘. 基于Minimum Cuts的蛋白质交互识别[J].计算机技术与发展,2017,27(06):17.
 ZHANG Jing,WU Hong-mei,NIU Yun. Identification of Protein-protein Interaction with Minimum Cuts[J].,2017,27(02):17.
[7]蔡松成,牛耘.基于最大期望算法的蛋白质交互关系识别[J].计算机技术与发展,2018,28(08):48.[doi:10.3969/ j. issn.1673-629X.2018.08.010]
 CAI Song-cheng,NIU Yun.Protein-protein Interaction Identification Based on Expectation Maximization Algorithm[J].,2018,28(02):48.[doi:10.3969/ j. issn.1673-629X.2018.08.010]
[8]毛宇薇,牛耘.基于分布式假设的弱监督蛋白质交互关系识别[J].计算机技术与发展,2018,28(09):34.[doi:10.3969/j.issn.1673-629X.2018.09.008]
 MAO Yu-wei,NIU Yun.Weakly Supervised Protein-protein Interaction Identification Based on Distribution Hypothesis[J].,2018,28(02):34.[doi:10.3969/j.issn.1673-629X.2018.09.008]
[9]闵庆凯,蔡松成.基于交叉预测的蛋白质交互识别[J].计算机技术与发展,2018,28(04):17.[doi:10.3969/ j. issn.1673-629X.2018.04.004]
 MIN Qing-kai,CAI Song-cheng.Protein-protein Interaction Identification Based on Cross Prediction[J].,2018,28(02):17.[doi:10.3969/ j. issn.1673-629X.2018.04.004]

更新日期/Last Update: 2019-02-10