«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.cnki.ISSN1673-629X.2024.0050]
点击复制

基于混合表征和协同训练的软件漏洞检测()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 34
期数:: 2024年05期

页码:: 126-132

栏目:: 人工智能

出版日期:: 2024-05-10

文章信息/Info

Title:: Software Vulnerability Detection Based on Mixed Representation and Cooperative Training

文章编号:: 1673-629X(2024)05-0126-07

作者:: 陈浩东1; 李琳1; 2; 乔梦晴1; 叶彪1; 1. 武汉科技大学计算机科学与技术学院,湖北武汉 430065;2. 智能信息处理与实时工业系统湖北省重点实验室,湖北武汉 430065

Author(s):: CHEN Hao-dong1; LI Lin1; 2; QIAO Meng-qing1; YE Biao1; 1. School of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan 430065,China;2. Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System,Wuhan 430065,China

关键词:: 深度学习; 混合表征; 漏洞检测; 协同训练; 集成学习

Keywords:: deep learning; mixed feature; vulnerability detection; cooperative training; ensemble learning

分类号:: TP309

DOI:: 10.20165/j.cnki.ISSN1673-629X.2024.0050

摘要:: 对于漏洞领域基准数据集较少导致的深度学习模型泛化能力较差,以及传统的基于规则引擎的漏洞检测工具性能较低的问题,提出了一种基于混合表征和协同训练的软件源代码漏洞检测方法。首先,基于预训练模型提取源代码文本特征,提取代码语义信息,然后使用工具生成抽象语法树,通过自定义遍历规则提取源代码的 AST(抽象语法树)特征,将两种特征进行混合丰富代码表征。其次,搭建多个深度模型,基于协同训练算法通过大量的无标签数据提升各模型的泛化能力。鉴于单一模型可能造成较高的漏报率和误报率,并可能被某一模型主导预测结果的问题,采用了基于加权投票机制的多模型集成方法。实验结果表明,该方法在一定程度上解决了数据集较少导致的模型泛化性差的问题,与漏洞检测领域一些主流检测方法相比,该方法在各指标上具有一定的优势,且检测性能高于规则引擎 Fortify。

Abstract:: In order to solve the problems of poor generalization ability of deep learning model and low performance of traditional rule engine-based vulnerability detection tools due to fewer benchmark datasets in vulnerability domain,a method of software source code vul-nerability detection based on mixed representation and cooperative training was proposed. Firstly,source code text features and code semantic information are extracted based on the pre - trained model. Then, tools are used to generate abstract syntax tree, and AST (Abstract syntax tree) features of source code are extracted by custom traversal rules,and the two features are mixed to enrich code repre-sentation. Secondly,multiple deep models are built,and the generalization ability of each model is improved through a large number of unlabeled data based on cooperative training algorithm. In view of the problem that a single model may cause high false positive rate and high false positive rate,and that one model may dominate the prediction results,a multi-model integration method based on weighted voting mechanism is adopted. The experimental results show that the proposed method can solve the problem of poor model generalization caused by fewer data sets to some extent. Compared with some mainstream detection methods in the field of vulnerability detection,the proposed method has certain advantages in various indicators,and the detection performance is higher than that of the rule engine Fortify.

相似文献/References:

[1]陈强锐,谢世朋.基于深度学习的肺部肿瘤检测方法[J].计算机技术与发展,2018,28(04):201.[doi:10.3969/ j. issn.1673-629X.2018.04.043]
　CHEN Qiang-rui,XIE Shi-peng.Lung Cancer Detection Method Based on Deep Learning[J].,2018,28(05):201.[doi:10.3969/ j. issn.1673-629X.2018.04.043]
[2]施泽浩,赵启军.基于全卷积网络的目标检测算法[J].计算机技术与发展,2018,28(05):55.[doi:10.3969/j.issn.1673－629X.2018.05.013]
　SHI Ze-hao,ZHAO Qi-jun.Object Detection Algorithm Based on Fully Convolutional Neural Network[J].,2018,28(05):55.[doi:10.3969/j.issn.1673－629X.2018.05.013]
[3]黄法秀,张世杰,吴志红,等.数据增广下的人脸识别研究[J].计算机技术与发展,2020,30(03):67.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 013]
　HUANG Fa-xiu,ZHANG Shi-jie,WU Zhi-hong,et al.Research on Face Recognition Based on Data Augmentation[J].,2020,30(05):67.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 013]
[4]陈浩翔,蔡建明,刘铿然,等. 手写数字深度特征学习与识别[J].计算机技术与发展,2016,26(07):19.
　CHEN Hao-xiang,CAI Jian-ming,LIU Keng-ran,et al. Deep Learning and Recognition of Handwritten Numeral Features[J].,2016,26(05):19.
[5]高翔,陈志,岳文静,等.基于视频场景深度学习的人物语义识别模型[J].计算机技术与发展,2018,28(06):53.[doi:10.3969/ j. issn.1673-629X.2018.06.012]
　GAO Xiang,CHEN Zhi,YUE Wen-jing,et al.Human Semantic Recognition Model Based on Video Scene Deep Learning[J].,2018,28(05):53.[doi:10.3969/ j. issn.1673-629X.2018.06.012]
[6]贺飞翔,赵启军. 基于深度学习的头部姿态估计[J].计算机技术与发展,2016,26(11):1.
　HE Fei-xiang,ZHAO Qi-jun. Head Pose Estimation Based on Deep Learning[J].,2016,26(05):1.
[7]徐融,邱晓晖.一种改进的 YOLO V3 目标检测方法[J].计算机技术与发展,2020,30(07):30.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 007]
　XU Rong,QIU Xiao-hui.An Improved YOLO V3 Object Detection[J].,2020,30(05):30.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 007]
[8]曾志平[] [],萧海东[],张新鹏[]. 基于DBN的金融时序数据建模与决策[J].计算机技术与发展,2017,27(04):1.
　ZENG Zhi-ping[] [],XIAO Hai-dong[],ZHANG Xin-peng[]. Modeling and Decision-making of Financial Time Series Data with DBN[J].,2017,27(05):1.
[9]李全兵,文钊*,田艳梅*,等.基于 WGAN 的音频关键词识别研究[J].计算机技术与发展,2021,31(08):26.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 005]
　LI Quan-bing,WEN Zhao *,TIAN Yan-mei *,et al.Research on Audio Keywords Recognition Based on WassersteinGenerative Adversarial Network[J].,2021,31(05):26.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 005]
[10]李宏林. 分析式纹理合成技术及其在深度学习的应用[J].计算机技术与发展,2017,27(11):7.
　LI Hong-lin. Analyzed Texture-synthesis Techniques and Their Applications in Deep Learning[J].,2017,27(05):7.

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed287
全文下载/Downloads109
评论/Comments