PDGcross:基于跨文件图表征的源代码漏洞检测-《计算机技术与发展》

文章信息/Info

Title:: PDGcross:Source Code Vulnerability Detection Based on Cross-file Graph Representation

作者:: 熊可欣¹ ; 李涛^2* ; 余琴¹ ; 乔梦晴¹; 1. 武汉科技大学计算机科学与技术学院,湖北武汉 430065;
2. 智能信息处理与实时工业系统湖北省重点实验室,湖北武汉 430065

Author(s):: XIONG Ke-xin1 ; LI Tao2* ; YU Qin1 ; QIAO Meng-qing1; 1. School of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan 430065,China;
2. Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System,Wuhan 430065,China

Keywords:: function calling; program dependence graph; vulnerability detection; source code; deep learning

摘要:: 随着软件安全性需求不断增长,大量的研究工作将深度学习应用于漏洞检测领域,目前存在多种源代码漏洞检测方法。现阶段在检测单个文件中由于函数调用
导致的漏洞方面有较好的效果,但由于复杂函数调用关系可能涉及到多个文件,针对多文件的漏洞检测是当前的检测难点之一。因此,该文在源代码程序依赖图
的基础上提出了一个新的图表征PDGcross,从一个文件入口合并其他被调用的文件, 生成一个图表征即 PDGcross。再运用 Node2Vec 图嵌入算法将PDGcross 进一步处理为特征矩阵,利用长短时记忆神经网络训练出漏洞分类模型,实现了一种基于跨文件程序依赖图表征和深度学习的源代码检测方法。在实验中,针对跨文件间的函数调用产生的漏洞,Fortify 和 PDG 表征的检测效率很低,而提出的基于 PDGcross 表征的检测方法则明显优于该两类方法。

Abstract:: With the increasing demand for software security,a large number of research works have applied deep learning to the field ofvulnerability detection.
?At this stage,it has a good effect in detecting vulnerabilities caused by function calls in a single file,however,thecomplex function call relationships may?
involve multiple files,the vulnerability detection for multiple files is one of the current detectiondifficulties. Therefore,we propose a new graph represent-
ation PDGcross based on the dependency graph of the source code program,which is generated by merging other called files from one file entry. Node-
2Vec graph embedding algorithm is used to further processPDGcross into feature matrix. Long Short Term Memory Network is used to train the vulnera-
bility classification model,and a method ofsource code detection based on cross-file program dependence graph representation and deep learning is implemented. In the experiment,Fortify and PDG representation have low detection efficiency for vulnerabilities generated by cross-file function calls,while the detectionmethod based on PDGcross representation proposed is significantly better than these two methods.