[1]陈 鹏,李勇志,余肖生.基于特征选择模型的钓鱼网站快速识别方法[J].计算机技术与发展,2021,31(04):40-45.[doi:10. 3969 / j. issn. 1673-629X. 2021. 04. 007]
 CHEN Peng,LI Yong-zhi,YU Xiao-sheng.Method for Quickly Identifying Phishing Websites Based onFeature Selection Model[J].,2021,31(04):40-45.[doi:10. 3969 / j. issn. 1673-629X. 2021. 04. 007]
点击复制

基于特征选择模型的钓鱼网站快速识别方法()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
31
期数:
2021年04期
页码:
40-45
栏目:
大数据分析与挖掘
出版日期:
2021-04-10

文章信息/Info

Title:
Method for Quickly Identifying Phishing Websites Based onFeature Selection Model
文章编号:
1673-629X(2021)04-0040-06
作者:
陈 鹏李勇志余肖生
三峡大学 计算机与信息学院,湖北 宜昌 443002
Author(s):
CHEN PengLI Yong-zhiYU Xiao-sheng
School of Computer and Information,Three Gorges University,Yichang 443002,China
关键词:
特征选择信息增益卡方检验随机森林递归特征消除
Keywords:
feature selectioninformation gainChi-square testrandom forestrecursive feature elimination
分类号:
TP391
DOI:
10. 3969 / j. issn. 1673-629X. 2021. 04. 007
摘要:
目前在识别钓鱼网站的研究中,对识别速度有着越来越高的需求,因此提出了一种基于混合特征选择模型的钓鱼网站快速识别方法。 混合特征选择模型包含初次特征选择、二次特征选择和分类三个主要部分,使用信息增益、卡方检验相结合以及基于随机森林的递归特征消除算法建立了混合特征选择模型,并在模型中使用分布函数与梯度,获取最佳截断阈值,得到最优数据集,从而提高钓鱼网站识别的效率。 实验数据表明,使用该混合特征选择模型进行特征筛选后的数据集,维度降低了 79. 2% ,在分类精确度几乎不损失的情况下,降低了 32% 的分类时间复杂度,有效地提高了分类效率。另外,使用 UCI 机器学习库中的大型钓鱼数据集对该模型进行评价,分类精确率虽然损失 1. 7% ,但数据集维度降低了70% ,分类时间复杂度降低了 41. 1% 。
Abstract:
At present,in the research of identifying phishing web sites,there is an increasing demand for recognition speed. Therefore, wepropose a fast recognition method for phishing websites based on a mixed feature selection model. The mixed feature selection model consists of three main parts: primary feature selection, secondary feature selection and classification. A hybrid feature selection model is established by combining information gain,Chi-square test and recursive feature elimination algorithm based on random forest. The distribution function and gradient are used in the model to obtain the optimal cutoff threshold and the optimal data set,so as to improve the efficiency of phishing website recognition. Experimental data shows that the data set after feature selection using this mixed feature selection model has a 79. 2% reduction in dimension,and reduces the classification time complexity by 32% with almost no loss of classification accuracy,effectively improving classification effectiveness. In addition,using the large-scale fishing data set in the UCI machine learning library to evaluate the model,although the classification accuracy rate is lost by 1. 7% ,the data set dimension is reduced by 70% ,and the classification time complexity is reduced by 41. 1% .

相似文献/References:

[1]刘利 何先平 袁文亮.股票趋势预测中Wrapper方法的研究与应用[J].计算机技术与发展,2010,(01):209.
 LIU Li,HE Xian-ping,YUAN Wen-liang.Research and Application of Wrapper Approach to Stock Trend Prediction[J].,2010,(04):209.
[2]黄炜 黄志华.一种基于遗传算法和SVM的特征选择[J].计算机技术与发展,2010,(06):21.
 HUANG Wei,HUANG Zhi-hua.Feature Selection Based on Genetic Algorithm and SVM[J].,2010,(04):21.
[3]张家柏 王小玲.基于聚类和二进制PSO的特征选择[J].计算机技术与发展,2010,(06):25.
 ZHANG Jia-bai,WANG Xiao-ling.A Novel Algorithm Based on K-Means Clustering and Binary Particle Swarm Optimization[J].,2010,(04):25.
[4]冯甲策 叶明 王惠文.基于Gram—Schmidt过程的支持向量机降维方法[J].计算机技术与发展,2009,(11):7.
 FENG Jia-ce,YE Ming,WANG Hui-wen.Dimension Reduction Method of Support Vector Machine Based on Gram- Schmidt Process[J].,2009,(04):7.
[5]李广水 郑滔 孙梅.基于分形维的决策树构建及应用研究[J].计算机技术与发展,2009,(12):5.
 LI Guang-shui,ZHENG Tao,SUN Mei.Research of Decision Tree Design and Application Based on Fractal Dimension[J].,2009,(04):5.
[6]刘毅 张月琳.基于Agent的邮件过滤与个性化分类系统设计[J].计算机技术与发展,2009,(02):66.
 LIU Yi,ZHANG Yue-lin.Design of a Mail Filter and Personalized Classification System Based on Agent[J].,2009,(04):66.
[7]陈素萍 谢丽聪.一种文本特征选择方法的研究[J].计算机技术与发展,2009,(02):112.
 CHEN Su-ping,XIE Li-cong.Research on Document Feature Selection[J].,2009,(04):112.
[8]段震 王倩倩 张燕平 张铃.覆盖算法下文本分类特征选择的研究[J].计算机技术与发展,2008,(11):29.
 DUAN Zhen,WANG Qian-qian,ZHANG Yan-ping,et al.Study on Feature Selection of Text Classification in Cross Cover Algorithm[J].,2008,(04):29.
[9]王希雷.基于Rough集理论的车牌汉字特征提取[J].计算机技术与发展,2007,(06):26.
 WANG Xi-lei.Car Plate Chinese Character Feature Extraction Based on Rough Set Theory[J].,2007,(04):26.
[10]董梅 胡学钢.基于多特征选择的中文文本分类[J].计算机技术与发展,2007,(07):117.
 DONG Mei,HU Xue-gang.Text Categorization Based on Multiple Features Selection[J].,2007,(04):117.
[11]林伟 柳荣其 徐熙.邮件过滤中一种改进的特征选择方法研究[J].计算机技术与发展,2009,(01):84.
 LIN Wei,LIU Rong-qi,XU Xi.Improvement of Feature Selection Algorithm in Spam Filtering[J].,2009,(04):84.
[12]王励烨,丁威威.基于同步性脑网络的注意力识别研究[J].计算机技术与发展,2023,33(02):146.[doi:10. 3969 / j. issn. 1673-629X. 2023. 02. 022]
 WANG Li-ye,DING Wei-wei.Attention Recognition Based on Synchronous Brain Network[J].,2023,33(04):146.[doi:10. 3969 / j. issn. 1673-629X. 2023. 02. 022]

更新日期/Last Update: 2020-04-10