[1]陈春玲,吴凡,余瀚.基于逻辑斯蒂回归的恶意请求分类识别模型[J].计算机技术与发展,2019,29(02):124-128.[doi:10.3969/j.issn.1673-629X.2019.02.026]
 CHEN Chunling,WU Fan,YU Han.A Classification and Recognition Model of Malicious Requests Based on Logistic Regression[J].,2019,29(02):124-128.[doi:10.3969/j.issn.1673-629X.2019.02.026]
点击复制

基于逻辑斯蒂回归的恶意请求分类识别模型()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
29
期数:
2019年02期
页码:
124-128
栏目:
安全与防范
出版日期:
2019-02-10

文章信息/Info

Title:
A Classification and Recognition Model of Malicious Requests Based on Logistic Regression
文章编号:
1673-629X(2019)02-0124-05
作者:
陈春玲吴凡余瀚
南京邮电大学 计算机学院,江苏 南京 210003
Author(s):
CHEN Chun-lingWU FanYU Han
School of Computer Science & Technology,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
关键词:
Web 请求逻辑斯蒂回归最大似然估计TF-IDF分类模型
Keywords:
Web requestslogistic regressionmaximum likelihood estimationTF-IDFclassification model
分类号:
TP301.6
DOI:
10.3969/j.issn.1673-629X.2019.02.026
摘要:
为了解决针对 Web 应用层的攻击,有效分类识别恶意请求,深入研究有监督的学习方法,针对请求文本内容不足、特征稀疏的缺陷,提出了一种基于非重复多 N-Gram 的 TF-IDF 分词策略和逻辑斯蒂回归方法构建的恶意请求分类模型。通过从 Secrepo 安全数据样本库等来源采集到的大量样本数据进行特征提取后对模型进行训练,以最大似然估计作为模型的优化目标,利用梯度下降的方法得到最优分类模型,并在测试集上验证模型的可靠性。实验结果表明,短文本、低语义的请求内容通过字母形式在多 N-Gram 的分词下构造的分类模型,相对于单词和单倍 N-Gram 分词的分类模型具有较高的分类准确率和得分,并且训练模型所耗时间相差不大。该方法训练出的最终模型在测试集上的准确率、召回率和 F 1值都达到了 99%以上。
Abstract:
In order to effectively defend the attack from Web application layer and classify and recognize the malicious requests,the supervised learning methods are researched in-depth. Aiming at the defects of insufficient content and sparse features of requests text,we pro-pose a malicious requests classifier model based on logistic regression method and TF-IDF word segmentation with non-repetition andmulti-N-Gram. The model is trained after feature extraction of a large number of sample data collected from online security databasesuch as Secrepo. Taking the maximum likelihood estimation as the optimization goal of the model,we use the gradient descent method toobtain the optimum classification model,and its reliability is validated on the test set. The experiment shows that compared with the clas-sification model of words and single-fold N-Gram segmentation,the classification model built by request content with short text and lowsemantic in letters on multi-N-Gram segmentation has higher accuracy and score. Their training time is not much different. The finalmodel trained by this way reaches more than 99% of accuracy,recall and F 1 -measure on test set.
更新日期/Last Update: 2019-02-10