[1]闵庆凯,蔡松成.基于交叉预测的蛋白质交互识别[J].计算机技术与发展,2018,28(04):17-20.[doi:10.3969/ j. issn.1673-629X.2018.04.004]
 MIN Qing-kai,CAI Song-cheng.Protein-protein Interaction Identification Based on Cross Prediction[J].,2018,28(04):17-20.[doi:10.3969/ j. issn.1673-629X.2018.04.004]
点击复制

基于交叉预测的蛋白质交互识别()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
28
期数:
2018年04期
页码:
17-20
栏目:
智能、算法、系统工程
出版日期:
2018-04-10

文章信息/Info

Title:
Protein-protein Interaction Identification Based on Cross Prediction
文章编号:
1673-629X(2018)04-0017-04
作者:
闵庆凯蔡松成
南京航空航天大学 计算机科学与技术学院,江苏 南京 211106
Author(s):
MIN Qing-kaiCAI Song-cheng
School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,
Nanjing 211106,China
关键词:
蛋白质交互远监督交叉预测去噪
Keywords:
protein-protein interactiondistant supervisioncross predictionnoise removal
分类号:
TP301
DOI:
10.3969/ j. issn.1673-629X.2018.04.004
文献标志码:
A
摘要:
目前,基于远监督的蛋白质交互关系抽取方法通过将知识库中的实体对与文本中的实体进行匹配来产生大规模的训练数据,有效地解决了标注数据不足的问题。 然而,通过远监督产生的训练数据存在大量的噪音,因此文中提出了一种交叉预测的方法来清除训练数据中的噪音。 首先将训练数据随机分为 k 组,取 1 组数据作为预测集,其余 k -1 组数据作为训练集,依次轮换训练集和预测集 k 次,每组数据都利用其余 k -1 组数据训练得到的模型来预测并去噪;然后将去噪后的数据重新组合得到新的训练数据,并用去噪前和去噪后的训练数据分别进行训练得到模型;最后用人工标注的语料分别对这两个模型进行测试。 实验结果证明,交叉预测的方法可以有效识别出训练数据中的噪音,从而提高蛋白质交互关系的识别效果。
Abstract:
Currently,protein-protein interaction (PPI) extraction approach based on distant supervision gathers large scale of training data by aligning entity pairs in knowledge base with entities in text,efficiently solving the lack of hand-labeled data. However,some sentences are labeled wrongly. For this,we propose an approach of cross prediction to remove the noise in training data. Firstly,we divide training data into k folds randomly,and select 1 fold as predicting data and the other k -1 fold as training data. Then,interchanged training data and predicting data for k times in turn,the noise in each fold is predicted and reduced through the model trained by the data of the other k -1 folds. Next we combine every part of data after reducing noise in it,and train two different models using training data before and after removing noise. Lastly,we test two different model with hand-labeled corpora. The experiments show that the proposed method is effective in noise removal,thus boosted the performance of PPI extraction.

相似文献/References:

[1]王宇伟,牛耘. 基于关系相似性的蛋白质交互作用识别[J].计算机技术与发展,2015,25(02):42.
 WANG Yu-wei,NIU Yun. Identification of Protein-protein Interaction Based on Relational Similarity[J].,2015,25(04):42.
[2]彭昀磊,牛 耘.基于弱监督的蛋白质交互识别[J].计算机技术与发展,2018,28(02):19.[doi:10.3969/j.issn.1673-629X.2018.02.005]
 PENG Yunlei,NIU Yun.Protein-protein Interaction Identification Based on Weak Supervision[J].,2018,28(04):19.[doi:10.3969/j.issn.1673-629X.2018.02.005]
[3]吴红梅,牛耘. 基于词性加权和单词相似性的蛋白质交互识别[J].计算机技术与发展,2015,25(12):6.
 WU Hong-mei,NIU Yun. Protein-protein Interaction Identification Based on POS Weighted and Word Similarity[J].,2015,25(04):6.
[4]吴红梅,牛耘. 基于特征加权的蛋白质交互识别[J].计算机技术与发展,2016,26(02):114.
 WU Hong-mei,NIU Yun. Identification of Protein-protein Interaction Based on Feature Weighted[J].,2016,26(04):114.
[5]彭昀磊,牛耘.基于词向量的特征词选择[J].计算机技术与发展,2018,28(06):7.[doi:10.3969/ j. issn.1673-629X.2018.06.002]
 PENG Yun-lei,NIU Yun.Feature Words Selection Based on Word Embedding[J].,2018,28(04):7.[doi:10.3969/ j. issn.1673-629X.2018.06.002]
[6]张景,吴红梅,牛耘. 基于Minimum Cuts的蛋白质交互识别[J].计算机技术与发展,2017,27(06):17.
 ZHANG Jing,WU Hong-mei,NIU Yun. Identification of Protein-protein Interaction with Minimum Cuts[J].,2017,27(04):17.
[7]蔡松成,牛耘.基于最大期望算法的蛋白质交互关系识别[J].计算机技术与发展,2018,28(08):48.[doi:10.3969/ j. issn.1673-629X.2018.08.010]
 CAI Song-cheng,NIU Yun.Protein-protein Interaction Identification Based on Expectation Maximization Algorithm[J].,2018,28(04):48.[doi:10.3969/ j. issn.1673-629X.2018.08.010]
[8]毛宇薇,牛耘.基于分布式假设的弱监督蛋白质交互关系识别[J].计算机技术与发展,2018,28(09):34.[doi:10.3969/j.issn.1673-629X.2018.09.008]
 MAO Yu-wei,NIU Yun.Weakly Supervised Protein-protein Interaction Identification Based on Distribution Hypothesis[J].,2018,28(04):34.[doi:10.3969/j.issn.1673-629X.2018.09.008]
[9]蔡松成,牛耘.基于词频统计的蛋白质交互关系识别[J].计算机技术与发展,2019,29(02):65.[doi:10.3969/j.issn.1673-629X.2019.02.013]
 CAI Songcheng,NIU Yun.Protein-protein Interaction Identification Based on Word Frequency Count[J].,2019,29(04):65.[doi:10.3969/j.issn.1673-629X.2019.02.013]

更新日期/Last Update: 2018-05-30