[1]展佳俊,赵逢禹,艾 均.基于多特征值的源代码相似性检测技术[J].计算机技术与发展,2021,31(01):103-109.[doi:10. 3969 / j. issn. 1673-629X. 2021. 01. 019]
ZHAN Jia-jun,ZHAO Feng-yu,AI Jun.Source Code Similarity Detection Technology Based on Multiple Eigenvalues[J].,2021,31(01):103-109.[doi:10. 3969 / j. issn. 1673-629X. 2021. 01. 019]
点击复制
基于多特征值的源代码相似性检测技术(
)
《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]
- 卷:
-
31
- 期数:
-
2021年01期
- 页码:
-
103-109
- 栏目:
-
系统工程
- 出版日期:
-
2021-01-10
文章信息/Info
- Title:
-
Source Code Similarity Detection Technology Based on Multiple Eigenvalues
- 文章编号:
-
1673-629X(2021)01-0103-07
- 作者:
-
展佳俊; 赵逢禹; 艾 均
-
上海理工大学 光电信息与计算机工程学院,上海 20009
- Author(s):
-
ZHAN Jia-jun; ZHAO Feng-yu; AI Jun
-
School of Optoelectronic Information and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China
-
- 关键词:
-
代码相似; 代码抄袭; 抽象语法树; 代码特征提取; 余弦相似度
- Keywords:
-
code similarity; code plagiarism; abstract syntax tree; code feature extraction; cosine similarity
- 分类号:
-
TP311
- DOI:
-
10. 3969 / j. issn. 1673-629X. 2021. 01. 019
- 摘要:
-
在软件开发的过程中,开发人员通过复制粘贴式的开发方式或者模块化的开发方式来完成需求是十分常见的,这两种开发方式可以提高开发效率,但同时会导致软件系统中出现大量的相同代码或者相似代码,大量的相似代码会给软件维护等方面带来很大的困难,这也是最常见的重构对象。 源代码相似性度量是指利用一定的检测方法分析程序源代码间的相似程度。 该技术被应用于代码抄袭检测、代码克隆检测、软件知识产权保护、代码复用等多个领域。 为了提高代码相似性度量的准确性,提出了一种基于多特征值的源代码相似性检测技术。 构建了源代码注释、型构、代码文本语句与结构中特征提取的方法,并给出了源代码相似度检测的度量模型。 通过与权威的代码相似检测系统 Moss 进行对比实验,结果表明该方法可以更准确地检测出相似代码。
- Abstract:
-
In the process of software development,it is quite common for developers to complete the requirements through a copy-and-paste development method or modular development method. These two development methods can improve development efficiency,but? ? ? ?at? the same time,they will cause a large number of the same code or similar code. A large number of similar codes will cause great difficulties in software maintenance and other aspects, which is also the most common refactoring object. The source code similarity measuring refers to the use of certain detection methods to analyze the similarity between the source codes of the program. The source code similarity measurement technology can be applied in many areas such as code plagiarism detection,code cloning detection, software intellectual property protection and code reuse. To improve the accuracy of code similarity measures,we propose? ? a source code similarity detection techno-logy based on multiple eigenvalues. The feature extraction method is given for source code comment,code construction,code text statement,code structure,and a measurement model for source code similarity detection is provided. By comparing experiments with the authoritative code similar detection system Moss,the results show that the proposed method can detect similar codes more accurately.
更新日期/Last Update:
2020-01-10