[1]李子俊,李涛*,陈浩东,等.基于抽象语法树特征迁移的软件漏洞检测方法(AST-FMVD)[J].计算机技术与发展,2024,34(06):81-88.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0065]
 LI Zi-jun,LI Tao*,CHEN Hao-dong,et al.Software Vulnerability Detection Method Based on Abstract Syntax Tree Feature Migration(AST-FMVD)[J].,2024,34(06):81-88.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0065]
点击复制

基于抽象语法树特征迁移的软件漏洞检测方法(AST-FMVD)()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
34
期数:
2024年06期
页码:
81-88
栏目:
网络空间安全
出版日期:
2024-06-10

文章信息/Info

Title:
Software Vulnerability Detection Method Based on Abstract Syntax Tree Feature Migration(AST-FMVD)
文章编号:
1673-629X(2024)06-0081-08
作者:
李子俊1李涛2*陈浩东3余琴1乔梦晴1李琳2王颉3万振华3宋荆汉3
1. 武汉科技大学 计算机科学与技术学院,湖北 武汉 430065;2. 智能信息处理与实时工业系统湖北省重点实验室,湖北 武汉 430065;3. 深圳开源互联网安全技术有限公司,广东 深圳 518000
Author(s):
LI Zi-jun1LI Tao2*CHEN Hao-dong3YU Qin1QIAO Meng-qing1LI Lin2WANG Jie3WAN Zhen-hua3SONG Jing-han3
1. School of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan 430065,China;2. Hubei Provincial Key Laboratory of Information Processing and Real Time Industrial Systems,Wuhan 430065,China;3. Shenzhen Kaiyuan Internet Security Technology Co. ,Ltd. ,Shenzhen 518000,China
关键词:
深度学习迁移学习零样本漏洞检测抽象语法树
Keywords:
deep learningtransfer learningzero-shotvulnerability detectionabstract syntax tree
分类号:
TP389;TP309
DOI:
10.20165/j.cnki.ISSN1673-629X.2024.0065
摘要:
深度学习在漏洞检测的应用取得了显著的进展。现有漏洞检测算法需要大量的标记数据,通过有监督的方法构建检测模型,在多语言环境中,由于语言的多样性和标记训练样本的缺乏,检测模型可能存在泛化性问题,特别是在小样本领域中可能表现不佳。 为了解决这一困境,迁移学习可以作为一种解决方案,迁移学习的核心思想是以“举一反三”为核心的算法框架,将某个领域的知识迁移到另一个领域的学习中,从而打破样本数据的制约。 该文提出了一种基于特征迁移的漏洞检测方法。 通过语义相似性对代码的语法树节点信息进行聚类,可以快速并准确地构建好不同语言之间的节点映射关系,同时在语法树的映射过程中引入上下文感知技术帮助解决歧义或模糊的语法结构,提高解析性能。 该方法实现检测样本从未知领域变换到已知领域,利用在原有领域构建的深度学习模型,可以将新领域任务迁移到已知领域,最终解决跨域的知识迁移的应用,并将该方法取名为 AST-FMVD。 最后通过 Java 的漏洞检测模型对含有特定漏洞文件的进行检测,实现模型在 Python 领域中的迁移应用,证明了 AST-FMVD 的可行性,并通过实验证明 AST-FMVD 可以实现源域中的训练模型在目标领域仍可以保证原模型良好的检测水平。
Abstract:
Deep learning has made significant progress in vulnerability detection. Existing vulnerability detection algorithms require a ambiguous or vague grammatical structures, improving parsing performance. The proposed method enables the detection samples to transform from unknown domains to known ones,and utilizing the deep learning model built in the original domain,the new domain task can be transferred to the known domain, ultimately solving the application of cross - domain knowledge transfer. It is named AST -FMVD. Finally,we use the Java vulnerability detection model to detect files containing specific vulnerabilities,realizing the model’s transfer application in the Python domain,proving the feasibility of AST-FMVD,and experimentally demonstrating that AST-FMVD can ensure the original model’s good detection level in the target domain large amount of labeled data and build detection models through supervised methods. In a multi - language environment, due to the diversity of languages and the lack of labeled training samples,the detection model may have generalization problems,especially in the field of small samples,where performance may be poor. To solve this dilemma,transfer learning can serve as a solution. The core idea of transfer learning is the " learning by analogy" algorithm framework,transferring knowledge from one domain to another,thereby breaking the constraints of sample data. We propose a feature-based transfer vulnerability detection method. By clustering the syntax tree node in-formation of the code through semantic similarity, the node mapping relationship between different languages can be quickly and accurately constructed. At the same time,context - aware technology is introduced in the syntax tree mapping process to help solve ambiguous or vague grammatical structures, improving parsing performance. The proposed method enables the detection samples to transform from unknown domains to known ones,and utilizing the deep learning model built in the original domain,the new domain task can be transferred to the known domain, ultimately solving the application of cross - domain knowledge transfer. It is named AST - FMVD. Finally,we use the Java vulnerability detection model to detect files containing specific vulnerabilities,realizing the model’s transfer application in the Python domain,proving the feasibility of AST-FMVD,and experimentally demonstrating that AST-FMVD can ensure the original model’s good detection level in the target domain.

相似文献/References:

[1]陈强锐,谢世朋.基于深度学习的肺部肿瘤检测方法[J].计算机技术与发展,2018,28(04):201.[doi:10.3969/ j. issn.1673-629X.2018.04.043]
 CHEN Qiang-rui,XIE Shi-peng.Lung Cancer Detection Method Based on Deep Learning[J].,2018,28(06):201.[doi:10.3969/ j. issn.1673-629X.2018.04.043]
[2]施泽浩,赵启军.基于全卷积网络的目标检测算法[J].计算机技术与发展,2018,28(05):55.[doi:10.3969/j.issn.1673-629X.2018.05.013]
 SHI Ze-hao,ZHAO Qi-jun.Object Detection Algorithm Based on Fully Convolutional Neural Network[J].,2018,28(06):55.[doi:10.3969/j.issn.1673-629X.2018.05.013]
[3]黄法秀,张世杰,吴志红,等.数据增广下的人脸识别研究[J].计算机技术与发展,2020,30(03):67.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 013]
 HUANG Fa-xiu,ZHANG Shi-jie,WU Zhi-hong,et al.Research on Face Recognition Based on Data Augmentation[J].,2020,30(06):67.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 013]
[4]李 勇,刘战东,张海军.跨项目软件缺陷预测方法研究综述[J].计算机技术与发展,2020,30(03):98.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 019]
 LI Yong,LIU Zhan-dong,ZHANG Hai-jun.Review on Cross-project Software Defects Prediction Methods[J].,2020,30(06):98.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 019]
[5]陈浩翔,蔡建明,刘铿然,等. 手写数字深度特征学习与识别[J].计算机技术与发展,2016,26(07):19.
 CHEN Hao-xiang,CAI Jian-ming,LIU Keng-ran,et al. Deep Learning and Recognition of Handwritten Numeral Features[J].,2016,26(06):19.
[6]高翔,陈志,岳文静,等.基于视频场景深度学习的人物语义识别模型[J].计算机技术与发展,2018,28(06):53.[doi:10.3969/ j. issn.1673-629X.2018.06.012]
 GAO Xiang,CHEN Zhi,YUE Wen-jing,et al.Human Semantic Recognition Model Based on Video Scene Deep Learning[J].,2018,28(06):53.[doi:10.3969/ j. issn.1673-629X.2018.06.012]
[7]贺飞翔,赵启军. 基于深度学习的头部姿态估计[J].计算机技术与发展,2016,26(11):1.
 HE Fei-xiang,ZHAO Qi-jun. Head Pose Estimation Based on Deep Learning[J].,2016,26(06):1.
[8]徐 融,邱晓晖.一种改进的 YOLO V3 目标检测方法[J].计算机技术与发展,2020,30(07):30.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 007]
 XU Rong,QIU Xiao-hui.An Improved YOLO V3 Object Detection[J].,2020,30(06):30.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 007]
[9]曾志平[] [],萧海东[],张新鹏[]. 基于DBN的金融时序数据建模与决策[J].计算机技术与发展,2017,27(04):1.
 ZENG Zhi-ping[] [],XIAO Hai-dong[],ZHANG Xin-peng[]. Modeling and Decision-making of Financial Time Series Data with DBN[J].,2017,27(06):1.
[10]武苏雯,赵慧杰,刘 鑫,等.基于迁移学习的图像分类在诗词中的应用研究[J].计算机技术与发展,2021,31(07):215.[doi:10. 3969 / j. issn. 1673-629X. 2021. 07. 036]
 WU Su-wen,ZHAO Hui-jie,LIU Xin,et al.Research on Application of Image Classification Based onTransfer Learning in Poetry[J].,2021,31(06):215.[doi:10. 3969 / j. issn. 1673-629X. 2021. 07. 036]
[11]蒋 鹏,何 勇,姚凯学,等.基于深度学习的糖尿病眼底病变分级方法研究[J].计算机技术与发展,2021,31(12):193.[doi:10. 3969 / j. issn. 1673-629X. 2021. 12. 032]
 JIANG Peng,HE Yong,YAO Kai-xue,et al.Research on Classification of Diabetic Retinopathy Based on Deep Learning[J].,2021,31(06):193.[doi:10. 3969 / j. issn. 1673-629X. 2021. 12. 032]
[12]黄贤山,卢 冶,张小立,等.基于迁移学习的灰铸铁金相组织分类研究[J].计算机技术与发展,2021,31(增刊):21.[doi:10. 3969 / j. issn. 1673-629X. 2021. S. 004]
 HUANG Xian-shan,LU Ye,ZHANG Xiao-li,et al.Research on Classification of Gray Cast Iron Metallographic Structure Based on Transfer Learning[J].,2021,31(06):21.[doi:10. 3969 / j. issn. 1673-629X. 2021. S. 004]
[13]陈 鑫,叶 宁,徐 康,等.基于 EfficientNet 模型的毫米波雷达人体行为识别[J].计算机技术与发展,2022,32(09):134.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 021]
 CHEN Xin,YE Ning,XU Kang,et al.FMCW Radar Human Action Recognition System Based on EfficientNet Model[J].,2022,32(06):134.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 021]
[14]林泽阳,赖 俊,陈希亮.基于课程学习的深度强化学习研究综述[J].计算机技术与发展,2022,32(11):16.[doi:10. 3969 / j. issn. 1673-629X. 2022. 11. 003]
 LIN Ze-yang,LAI Jun,CHEN Xi-liang.An Overview of Deep Reinforcement Learning Based on Curriculum Learning[J].,2022,32(06):16.[doi:10. 3969 / j. issn. 1673-629X. 2022. 11. 003]
[15]张 睿,李允臣,王家宝*,等.基于深度学习的红外目标检测综述[J].计算机技术与发展,2023,33(11):1.[doi:10. 3969 / j. issn. 1673-629X. 2023. 11. 001]
 ZHANG Rui,LI Yun-chen,WANG Jia-bao*,et al.Survey on Infrared Object Detection Based on Deep Learning[J].,2023,33(06):1.[doi:10. 3969 / j. issn. 1673-629X. 2023. 11. 001]
[16]李 林,俞 璐,韩昌芝,等.多源域适应方法综述[J].计算机技术与发展,2024,34(03):1.[doi:10. 3969 / j. issn. 1673-629X. 2024. 03. 001]
 LI Lin,YU Lu,HAN Chang-zhi,et al.A Review of Multi-source Domain Adaptation[J].,2024,34(06):1.[doi:10. 3969 / j. issn. 1673-629X. 2024. 03. 001]

更新日期/Last Update: 2024-06-10