[1]凌玉龙,张 晓,李 霞,等.改进 kmeans 算法在学生消费画像中的应用[J].计算机技术与发展,2021,31(10):122-127.[doi:10. 3969 / j. issn. 1673-629X. 2021. 10. 021]
 LING Yu-long,ZHANG Xiao,LI Xia,et al.Application of Improved kmeans Algorithm in Student Consumption Portrait[J].,2021,31(10):122-127.[doi:10. 3969 / j. issn. 1673-629X. 2021. 10. 021]
点击复制

改进 kmeans 算法在学生消费画像中的应用()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
31
期数:
2021年10期
页码:
122-127
栏目:
应用前沿与综合
出版日期:
2021-10-10

文章信息/Info

Title:
Application of Improved kmeans Algorithm in Student Consumption Portrait
文章编号:
1673-629X(2021)10-0122-06
作者:
凌玉龙1 张 晓1 李 霞2 张 勇1
1. 西北工业大学 大数据存储与管理工信部重点实验室,陕西 西安 710129;
2. 西北工业大学 学生资助服务中心,陕西 西安 710129
Author(s):
LING Yu-long1 ZHANG Xiao1 LI Xia2 ZHANG Yong1
1. Ministry of Communications Key Laboratory of Big Data Storage and Management,Northwestern Polytechnical University,Xi’an 710129,China;
2. Student Aid Service Center,Northwestern Polytechnical University,Xi’an 710129,China
关键词:
改进 kmeans 算法马氏距离初始聚类中心集合学生消费画像精准资助
Keywords:
improved kmeans algorithmMahalanobis distanceinitial clustering center setstudent consumption portraitprecision funding
分类号:
TP311. 13
DOI:
10. 3969 / j. issn. 1673-629X. 2021. 10. 021
摘要:
学生校园消费数据中隐含了大量的高价值信息,论文从学生消费画像和精准资助两个角度对校园消费数据进行挖掘研究。 从数据集本身的特点和 kmeans 算法的缺陷两个角度出发,论文对 kmeans 算法进行了两点改进:采用马氏距离代替欧氏距离以适用于具体的校园消费数据应用场景;为了解决随机选择初始聚类中心的方法受离群样本点的影响,采用在高密度样本集合中选择相距最远的 k 个样本作为初始聚类中心的改进方法。 在西安某高校 3 个月的学生消费数据集上的实验结果表明:论文设计的学生群体分类模型不仅能有效区分不同行为特征的学生,而且能很好地刻画学生的消费画像;通过对比分析聚类标记的贫困生名单和线下认定的贫困生名单,证明了改进 kmeans 算法在精准资助中的应用价值。
Abstract:
There is a large amount of high -value information hidden in the student campus consumption data. We explore the campus consumption data from the two perspectives of student consumption portrait and precision funding. From the perspective of the characteristics of the data set itself and the defect of the k-means algorithm,two improvements have been made to the k-means algorithm.On the one hand, Mahalanobis distance is used instead of Euclidean distance to be suitable for specific campus consumption data application scenarios. On the other hand,in order to solve the problem that the method of randomly selecting the initial cluster center set will be affected by outlier sample points,an improved method of selecting the k samples that are the farthest apart from the high-density sample set as the initial cluster center set is adopted. The experiment on three months of student consumption data set in a university in Xi’an shows that the designed student group classification model can not only effectively distinguish students with different behavior characteristics,but also can well display the student consumption portrait. The comparative analysis of the list of poor students marked by the improved k-means algorithm and the list of poor students identified offline proves the application value of the improved k-means algorithm in precision funding.

相似文献/References:

[1]李媛 卡米力·毛依丁.维吾尔语笔迹鉴别方法研究[J].计算机技术与发展,2008,(05):9.
 LI Yuan,KAMIL- Moydi.Uighur Handwriting Distinction Method Research[J].,2008,(10):9.
[2]朱伟冬 胡剑凌.基于马氏距离的稀疏表示分类算法[J].计算机技术与发展,2011,(11):27.
 ZHU Wei-dong,HU Jian-ling.Sparse Representation Classification Algorithm Based on Mahalanobis Distance[J].,2011,(10):27.
[3]张明恒 王华莹 郭烈.基于改进K—Means算法的车辆识别方法[J].计算机技术与发展,2012,(05):53.
 ZHANG Ming-heng,WANG Hua-ying,GUO Lie.Method of Vehicle Detection Based on Improved K-Means Algorithm[J].,2012,(10):53.
[4]尚福华[],姜萌[],马楠[],等. 基于改进马氏聚类方法的油田分类研究[J].计算机技术与发展,2015,25(08):175.
 SHANG Fu-hua[],JIANG Meng[],MA Nan[],et al. Research on Oil Classification Based on Improved Mahalanobis Clustering Method[J].,2015,25(10):175.
[5]李慧,雷丽晖. 云计算环境下基于马氏距离的任务调度策略研究[J].计算机技术与发展,2017,27(01):53.
 LI Hui,LEI Li-hui. Research on Task Scheduling Strategy in Cloud Computing Based on Mahalanobis Distance[J].,2017,27(10):53.
[6]李 博,李 霞,张 晓,等.MD-KNN 算法在高校精准资助中的应用[J].计算机技术与发展,2020,30(07):91.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 020]
 LI Bo,LI Xia,ZHANG Xiao,et al.Application of MD-KNN in Accurate Subsidy of Colleges[J].,2020,30(10):91.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 020]
[7]杨正宇*,沈志强,郑成源.灰狼算法优化 SVR 的 10kV 配网线损率预测研究[J].计算机技术与发展,2024,34(03):35.[doi:10. 3969 / j. issn. 1673-629X. 2024. 03. 006]
 YANG Zheng-yu*,SHEN Zhi-qiang,ZHENG Cheng-yuan.Research on Line Loss Rate Prediction of 10kV Distribution Network Based on SVR Optimized by Gray Wolf Algorithm[J].,2024,34(10):35.[doi:10. 3969 / j. issn. 1673-629X. 2024. 03. 006]

更新日期/Last Update: 2021-10-10