[1]褚荣燕,王 钰 *,杨杏丽,等.基于正则化 KL 距离的交叉验证折数 K 的选择[J].计算机技术与发展,2021,31(03):52-57.[doi:10. 3969 / j. issn. 1673-629X. 2021. 03. 009]
 CHU Rong-yan,WANG Yu *,YANG Xing-li,et al.A Selection Criterion of Fold K in Cross-validation Based on Regularized KL Distance[J].,2021,31(03):52-57.[doi:10. 3969 / j. issn. 1673-629X. 2021. 03. 009]
点击复制

基于正则化 KL 距离的交叉验证折数 K 的选择()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
31
期数:
2021年03期
页码:
52-57
栏目:
大数据分析与挖掘
出版日期:
2021-03-10

文章信息/Info

Title:
A Selection Criterion of Fold K in Cross-validation Based on Regularized KL Distance
文章编号:
1673-629X(2021)03-0052-06
作者:
褚荣燕1王 钰23 *杨杏丽1李济洪3
1. 山西大学 数学科学学院,山西 太原 030006;
2. 山西大学 现代教育技术学院,山西 太原 030006;
3. 山西大学 软件学院,山西 太原 030006
Author(s):
CHU Rong-yan1WANG Yu23 *YANG Xing-li1LI Ji-hong3
1. School of Mathematical Sciences,Shanxi University,Taiyuan 030006,China;
2. School of Modern Educational Technology,Shanxi University,Taiyuan 030006,China;
3. School of Software,Shanxi University,Taiyuan 030006,China
关键词:
K 折交叉验证折数 K 的选择KL(Kullback-Leibler)距离正则化机器学习
Keywords:
K-fold cross-validationselection of the fold KKL distance (Kullback-Leibler distance)regularizedmachine learning
分类号:
TP181
DOI:
10. 3969 / j. issn. 1673-629X. 2021. 03. 009
摘要:
在机器学习中,K 折交叉验证方法常常通过把数据分成多个训练集和测试集来进行模型评估与选择,然而其折数 K 的选择一直是一个公开的问题。 注意到上述交叉验证数据划分的一个前提假定是训练集和测试集的分布一致,但是实际数据划分中,往往不是这样。 因此,可以通过度量训练集和测试集的分布一致性来进行 K 折交叉验证折数 K 的选择。直观地,KL(Kullback-Leibler)距离是一种合适的度量方法,因为它度量了两个分布之间的差异。 然而直接基于 KL 距离进行 K 的选择时,从多个数据实验结果发现随着 K 的增加 KL 距离也在增大,显然这是不合适的。 为此,提出了一种基于正则化 KL 距离的 K 折交叉验证折数 K 的选择准则,通过最小化此正则 KL 距离来选择合适的折数 K。 进一步多个真实数据实验验证了提出准则的有效性和合理性。
Abstract:
In machine learning,the K-fold cross-validation method often divides the data into multiple training and test sets for model evaluation and selection. However,the selection of the fold K is always an open problem. Note that one of the premises of the above crossvalidation data division assumes that the training set and the test set have the same distribution,but in actual data division,this is often not the case. Therefore,the selection of the fold K can be performed by measuring the distribution consistency of the training set and the testset in K-fold cross-validation. Intuitively,KL ( Kullback-Leibler) distance is a suitable measure because it measures the differencebetween two distributions. However,when selecting K directly based on the KL distance,it is found from multiple data experimentalresults that the KL distance also increases with the increase of K,which is obviously inappropriate. To this end,a selection criterion of thefold K in K-fold cross-validation based on regularized KL distance is proposed,and the appropriate fold K is selected by minimizing thisregular KL distance. Multiple real data experiments in a recent step have verified the effectiveness and rationality of the proposed criterion.

相似文献/References:

[1]唐海涛,吴果林,范广义,等.融合 SIFT 和级联分类器的特种车辆自动检测识别[J].计算机技术与发展,2023,33(09):182.[doi:10. 3969 / j. issn. 1673-629X. 2023. 09. 027]
 TANG Hai-tao,WU Guo-lin,FAN Guang-yi,et al.Automatic Detection and Recognition of Special Vehicles Incorporating SIFT and Cascade Classifier[J].,2023,33(03):182.[doi:10. 3969 / j. issn. 1673-629X. 2023. 09. 027]

更新日期/Last Update: 2020-03-10