[1]彭昀磊,牛耘.基于词向量的特征词选择[J].计算机技术与发展,2018,28(06):7-11.[doi:10.3969/ j. issn.1673-629X.2018.06.002]
 PENG Yun-lei,NIU Yun.Feature Words Selection Based on Word Embedding[J].,2018,28(06):7-11.[doi:10.3969/ j. issn.1673-629X.2018.06.002]
点击复制

基于词向量的特征词选择()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
28
期数:
2018年06期
页码:
7-11
栏目:
智能、算法、系统工程
出版日期:
2018-06-10

文章信息/Info

Title:
Feature Words Selection Based on Word Embedding
文章编号:
1673-629X(2018)06-0007-05
作者:
彭昀磊牛耘
南京航空航天大学 计算机科学与技术学院,江苏 南京 210016
Author(s):
PENG Yun-leiNIU Yun
School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China
关键词:
蛋白质交互词向量聚类特征词
Keywords:
protein-protein interactionword embeddingclusteringfeature words
分类号:
TP391
DOI:
10.3969/ j. issn.1673-629X.2018.06.002
文献标志码:
A
摘要:
蛋白质交互信息有助于解决大量医学难题,这些信息都被记录在医学文献中,而每年的生物医学文献都在急剧增加,以手工收集信息的方式已很难满足实际的需求。 在基于弱监督的蛋白质交互识别的基础上,提出了一种基于词向量的方法进行特征词选择。 该方法用词向量的方式为特征词集合中的每个单词产生一个向量,从而将单词之间相似性的比较转化为单词对应向量之间的相似性比较,进而将单词聚类,再从聚类的结果中选出更能表达蛋白质交互关系的词组成新的特征词集合,以便高效、准确地进行蛋白质交互识别。 使用词向量进行聚类可将相似的词归为一类,而不要求词完全相同,使得聚类结果更优。 实验结果表明,该方法以五分之一的特征词取得了比未进行特征词选择更优的结果。
Abstract:
Protein-protein interaction information can help solve a lot of medical problems and is recorded in the medical literature. However,the biomedical literature is increasing dramatically each year and collecting information manually has been difficult to meet the actual needs. In this paper,based on the protein interaction recognition using weak supervision,we propose a new method of word embedding. The method produces a vector for each word in feature words set in terms of word embedding and it translates comparison of similarity between words into comparison of similarity between vectors which words correspond to. Then,words are clustered,and the words that are more likely to express interactions are selected from the results of clustering to constitute new feature words set. It can make protein-protein interaction recognition more efficient and precise. Clustering of word embedding can put the similar words into one category.It does not require the exact same words,which makes clustering result better. The experiment shows that using this method by a fifth of
the feature words achieves better result than the case of not using feature words selection.

相似文献/References:

[1]王宇伟,牛耘. 基于关系相似性的蛋白质交互作用识别[J].计算机技术与发展,2015,25(02):42.
 WANG Yu-wei,NIU Yun. Identification of Protein-protein Interaction Based on Relational Similarity[J].,2015,25(06):42.
[2]彭昀磊,牛 耘.基于弱监督的蛋白质交互识别[J].计算机技术与发展,2018,28(02):19.[doi:10.3969/j.issn.1673-629X.2018.02.005]
 PENG Yunlei,NIU Yun.Protein-protein Interaction Identification Based on Weak Supervision[J].,2018,28(06):19.[doi:10.3969/j.issn.1673-629X.2018.02.005]
[3]吴红梅,牛耘. 基于词性加权和单词相似性的蛋白质交互识别[J].计算机技术与发展,2015,25(12):6.
 WU Hong-mei,NIU Yun. Protein-protein Interaction Identification Based on POS Weighted and Word Similarity[J].,2015,25(06):6.
[4]吴红梅,牛耘. 基于特征加权的蛋白质交互识别[J].计算机技术与发展,2016,26(02):114.
 WU Hong-mei,NIU Yun. Identification of Protein-protein Interaction Based on Feature Weighted[J].,2016,26(06):114.
[5]张翠肖,郝杰辉,刘星宇,等.基于 CNN-BiLSTM 的中文微博立场分析研究[J].计算机技术与发展,2020,30(07):154.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 033]
 ZHANG Cui-xiao,HAO Jie-hui,LIU Xing-yu,et al.Research on Stance Detection in Chinise Micro-blog Based on CNN-BiLSTM[J].,2020,30(06):154.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 033]
[6]张景,吴红梅,牛耘. 基于Minimum Cuts的蛋白质交互识别[J].计算机技术与发展,2017,27(06):17.
 ZHANG Jing,WU Hong-mei,NIU Yun. Identification of Protein-protein Interaction with Minimum Cuts[J].,2017,27(06):17.
[7]闵庆凯,蔡松成.基于交叉预测的蛋白质交互识别[J].计算机技术与发展,2018,28(04):17.[doi:10.3969/ j. issn.1673-629X.2018.04.004]
 MIN Qing-kai,CAI Song-cheng.Protein-protein Interaction Identification Based on Cross Prediction[J].,2018,28(06):17.[doi:10.3969/ j. issn.1673-629X.2018.04.004]
[8]蔡松成,牛耘.基于最大期望算法的蛋白质交互关系识别[J].计算机技术与发展,2018,28(08):48.[doi:10.3969/ j. issn.1673-629X.2018.08.010]
 CAI Song-cheng,NIU Yun.Protein-protein Interaction Identification Based on Expectation Maximization Algorithm[J].,2018,28(06):48.[doi:10.3969/ j. issn.1673-629X.2018.08.010]
[9]毛宇薇,牛耘.基于分布式假设的弱监督蛋白质交互关系识别[J].计算机技术与发展,2018,28(09):34.[doi:10.3969/j.issn.1673-629X.2018.09.008]
 MAO Yu-wei,NIU Yun.Weakly Supervised Protein-protein Interaction Identification Based on Distribution Hypothesis[J].,2018,28(06):34.[doi:10.3969/j.issn.1673-629X.2018.09.008]
[10]秦牧轩,荆晓远,吴 飞.基于公共空间嵌入的端到端深度零样本学习[J].计算机技术与发展,2018,28(11):44.[doi:10.3969/ j. issn.1673-629X.2018.11.010]
 QIN Mu-xuan,JING Xiao-yuan,WU Fei.End-to-end Deep Zero-shot Learning Based on Co-space Embedding[J].,2018,28(06):44.[doi:10.3969/ j. issn.1673-629X.2018.11.010]

更新日期/Last Update: 2018-07-20