[1]贾 清,杨 抒.基于 Word2vec 的克隆代码检测方法研究[J].计算机技术与发展,2020,30(08):124-128.[doi:10. 3969 / j. issn. 1673-629X. 2020. 08. 021]
 JIA Qing,YANG Shu.Research on Clone Code Detection Method Based on Word2vec[J].,2020,30(08):124-128.[doi:10. 3969 / j. issn. 1673-629X. 2020. 08. 021]
点击复制

基于 Word2vec 的克隆代码检测方法研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
30
期数:
2020年08期
页码:
124-128
栏目:
应用开发研究
出版日期:
2020-08-10

文章信息/Info

Title:
Research on Clone Code Detection Method Based on Word2vec
文章编号:
1673-629X(2020)08-0124-05
作者:
贾 清杨 抒
新疆农业大学 计算机与信息工程学院,新疆 乌鲁木齐 830052
Author(s):
JIA QingYANG Shu
School of Computer and Information Engineering,Xinjiang Agricultural University,Urumqi 830052,China
关键词:
Word2vec克隆代码自动检测相似度软件维护
Keywords:
Word2vecclone codeautomatic detectionsimilaritysoftware maintenance
分类号:
TP311
DOI:
10. 3969 / j. issn. 1673-629X. 2020. 08. 021
摘要:
系统中的克隆代码会增加程序员理解代码、修改代码的时间,并且在代码中一处克隆代码出现错误可能会导致系统中多个相同代位置的代码出现错误,大大增大了程序员进行软件维护的成本。 为了找到系统文件中的克隆代码,利用基于 Word2vec 的克隆代码检测方法,针对新疆马业电商平台中的代码进行克隆检测。 通过对系统源代码进行数据清洗,去除不需要的字符;Word2vec 模型是一群浅并且双层的神经网络,选择 Word2vec 中的 skip-gram 模型进行训练并且构造词向量。 训练完成后,模型可用来映射每个词到一个向量,用来表示词对词之间的关系。 最后通过夹角余弦的方法来计算代码相似度,从而自动检测代码中的克隆代码。 研究结果表明:基于 Word2vec 的克隆代码检测方法可以很好地检测出代码文件中的克隆代码,并且以指定的方式进行输出。
Abstract:
The clone code in the system will increase the time for the programmer to understand the code and modify it,and the mistake of a clone code in? ?the code may lead to the mistake of the code in the same generation position in the system,which greatly increases the cost of the programmer’s software maintenance. In order to find the clone code in the system file,we use the clone code detection method based on Word2vec to clone the code in the Xinjiang Horse Industry e - commerce platform. The unnecessary characters are removed by data cleaning of the system source code. Word2vec model is a group of shallow and double-layer neural networks. Skipgram model in Word2vec is selected to train and construct word vectors. After training,the model can be used to map each word to a vector to express the relationship between words. At last,the code similarity is calculated by the method of Angle cosine,so that the clone code in the code can be detected automatically. The results show that Word2vec-based clone code detection method can detect the clone code in the code file effectively and output it in the specified way.

相似文献/References:

[1]卫华,韩立新,夏建华. 基于Word2 fea模型的文本建模方法[J].计算机技术与发展,2016,26(02):165.
 WEI Hua,HAN Li-xin,XIA Jian-hua. Text Modeling Method Based on Word2 fea Model[J].,2016,26(08):165.
[2]张兴兰,刘炀. 基于复杂网络及神经网络挖掘用户兴趣的方法[J].计算机技术与发展,2016,26(12):22.
 ZHANG Xing-lan,LIU Yang. Method of Mining User Interest Based on Complex Network and Neural Network[J].,2016,26(08):22.
[3]倪高伟,李 涛,刘 峥.结合语义和结构的短文本相似度计算[J].计算机技术与发展,2018,28(08):104.[doi:10.3969/ j. issn.1673-629X.2018.08.022]
 NI Gao-wei,LI Tao,LIU Zheng.Similarity Calculation of Short Text Combined with Semantic and Structure[J].,2018,28(08):104.[doi:10.3969/ j. issn.1673-629X.2018.08.022]
[4]侯 敏,张丽萍.克隆代码检测技术研究[J].计算机技术与发展,2019,29(08):86.[doi:10. 3969 / j. issn. 1673-629X. 2019. 08. 017]
 HOU Min,ZHANG Li-ping.Research on Software Clone Detection Technology[J].,2019,29(08):86.[doi:10. 3969 / j. issn. 1673-629X. 2019. 08. 017]
[5]李 鑫.一种面向 Mashup 应用的 API 推荐方法[J].计算机技术与发展,2021,31(02):38.[doi:10. 3969 / j. issn. 1673-629X. 2021. 02. 007]
 LI Xin.An API Recommendation Method for Mashup Application[J].,2021,31(08):38.[doi:10. 3969 / j. issn. 1673-629X. 2021. 02. 007]
[6]何烨辛,谷 林,孙 晨.基于CNN的程序编译错误信息特征提取[J].计算机技术与发展,2021,31(05):204.[doi:10. 3969 / j. issn. 1673-629X. 2021. 05. 035]
 ,CNN-basedProgram CompilationErrorMessageFeatureExtractio[J].,2021,31(08):204.[doi:10. 3969 / j. issn. 1673-629X. 2021. 05. 035]
[7]冼广铭,王鲁栋,曾碧卿,等.基于 LDA 和 BiGRU 的文本分类[J].计算机技术与发展,2022,32(04):15.[doi:10. 3969 / j. issn. 1673-629X. 2022. 04. 003]
 XIAN Guang-ming,WANG Lu-dong,ZENG Bi-qing,et al.Text Classification Based on LDA and BiGRU[J].,2022,32(08):15.[doi:10. 3969 / j. issn. 1673-629X. 2022. 04. 003]
[8]王小楠,黄卫东.基于类别主题词集的加权相似度短文本分类[J].计算机技术与发展,2022,32(09):95.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 015]
 WANG Xiao-nan,HUANG Wei-dong.Short Text Classification with Weighted Similarity Based on Category Topic Word Set[J].,2022,32(08):95.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 015]
[9]关 慧,曹同洲.基于 CNN 和多注意力机制的 XSS 检测模型[J].计算机技术与发展,2023,33(04):175.[doi:10. 3969 / j. issn. 1673-629X. 2023. 04. 026]
 GUAN Hui,CAO Tong-zhou.XSS Detection Model Based on CNN and Multi-attention Mechanism[J].,2023,33(08):175.[doi:10. 3969 / j. issn. 1673-629X. 2023. 04. 026]

更新日期/Last Update: 2020-08-10