[1]徐 魁,海 洋,许艺凡,等.基于伽马内核与加权 K 近邻的流量分类算法[J].计算机技术与发展,2023,33(02):214-220.[doi:10. 3969 / j. issn. 1673-629X. 2023. 02. 032]
 XU Kui,HAI Yang,XU Yi-fan,et al.Traffic Classification Algorithm Based on Gamma Kernel and Weighted K-Nearest Neighbors[J].,2023,33(02):214-220.[doi:10. 3969 / j. issn. 1673-629X. 2023. 02. 032]
点击复制

基于伽马内核与加权 K 近邻的流量分类算法()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
33
期数:
2023年02期
页码:
214-220
栏目:
新型计算应用系统
出版日期:
2023-02-10

文章信息/Info

Title:
Traffic Classification Algorithm Based on Gamma Kernel and Weighted K-Nearest Neighbors
文章编号:
1673-629X(2023)02-0214-07
作者:
徐 魁1 海 洋1 许艺凡23 段靖海2 孙炜策23 陶 军23
1. 宝鸡市公安局通信处,陕西 宝鸡 721014;
2. 东南大学 网络空间安全学院,江苏 南京 211189;
3. 计算机网络和信息集成教育部重点实验室(东南大学),江苏 南京 211189
Author(s):
XU Kui1 HAI Yang1 XU Yi-fan23 DUAN Jing-hai2 SUN Wei-ce23 TAO Jun23
1. Baoji Municipal Security Bureau,Baoji 721014,China;
2. School of Cyber Sci. & Engr. ,Southeast University,Nanjing 211189,China;
3. Key Lab of CNII,MOE ( Southeast University) ,Nanjing 211189,China
关键词:
K 最近邻算法Gamma 分布自信息距离函数网络流量分类
Keywords:
KNNGamma distributionself-informationdistance functionnetwork traffic classification
分类号:
TP393
DOI:
10. 3969 / j. issn. 1673-629X. 2023. 02. 032
摘要:
K 最近邻算法( KNN) 是一种简单有效的分类方式。 当数据集分布均衡,不同类别样本之间的差异显著时,KNN 的分类效果一般较好。 但实际中数据集通常不理想, 网络流量往往呈现倾斜分布,存在样本之间差异不显著等问题。 为了更好地权衡样本距离之间差异以及流量类别分布不均带来的模型准确率下降问题,提出了一种基于 Gamma 内核与加权KNN 的流量分类算法,综合考虑了距离和流量分布对分类结果的影响。 采用 Gamma 分布函数作为内核,对不同类别采用自信息进行加权。 最后得到 G-WKNN 模型,并将该模型应用于 CIC-IDS2017 数据集。 实验结果表明,在流量均衡的情况下,模型准确率稳定在 0. 91 左右。 在流量不均衡时,依旧具备良好的分类表现。 对比其余几种改良的 KNN 算法,其分类准确率较高且模型稳定性好,对 K 值相对不敏感。 同时 G-WKNN 模型对少数类别分类准确率的提升效果也较为显著。
Abstract:
K-Nearest Neighbors ( KNN) is a simple and effective way of classification. When the distribution of the dataset is balancedand the differences between samples of different categories are significant,the classification effect of KNN is generally good. However,the dataset is usually not ideal,and network traffic tends to present skewed distribution,with insignificant differences between samples andother problems. To better balance the difference between sample distances and the problem of model accuracy degradation caused byuneven distribution of traffic categories,we propose a traffic classification algorithm based on Gamma kernel and weighted KNN,whichcomprehensively considers the impact of distance and traffic distribution. The Gamma distribution function is used as the kernel,and theself-information is weighted for different categories. Finally,G-WKNN model is obtained and applied to the CIC-IDS2017 dataset. Theexperimental results show that the model accuracy is stable around 0. 91 in the case of balanced traffic. When the traffic is unbalanced,itstill has a good classification performance. Compared with the other improved KNN algorithms,its classification accuracy is higher withbetter stability,and more insensitive to the K value. At the same time,the G-WKNN model has a significant improvement effect on theclassification accuracy of minority categories.
更新日期/Last Update: 2023-02-10