[1]舒志鸿,沈苏彬.在不平衡数据中进行高效通信的联邦学习[J].计算机技术与发展,2021,31(12):33-38.[doi:10. 3969 / j. issn. 1673-629X. 2021. 12. 006]
 SHU Zhi-hong,SHEN Su-bin.Communication-efficient Federated Learning from Imbalanced Data[J].,2021,31(12):33-38.[doi:10. 3969 / j. issn. 1673-629X. 2021. 12. 006]
点击复制

在不平衡数据中进行高效通信的联邦学习()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
31
期数:
2021年12期
页码:
33-38
栏目:
大数据分析与挖掘
出版日期:
2021-12-10

文章信息/Info

Title:
Communication-efficient Federated Learning from Imbalanced Data
文章编号:
1673-629X(2021)12-0033-06
作者:
舒志鸿1 沈苏彬2
1. 南京邮电大学 计算机学院,江苏 南京 210046;
2. 南京邮电大学 通信与网络技术国家工程研究中心,江苏 南京 210046
Author(s):
SHU Zhi-hong1 SHEN Su-bin2
1. School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210046,China;
2. National Engineering Research Center on Communication and Networking,Nanjing University of Posts and Telecommunications,Nanjing 210046,China
关键词:
联邦学习机器学习不平衡数据海林格距离聚合
Keywords:
federated learningmachine learningimbalanced dataHellinger distanceaggregation
分类号:
TP181
DOI:
10. 3969 / j. issn. 1673-629X. 2021. 12. 006
摘要:
联邦学习( FL)是一种分布式的机器学习方法,它通过中心服务器汇总各个移动终端在本地训练的机器学习模型,使得多个参与方能够协作进行高效率的机器学习。 同时,FL 不需要将终端的私人数据发送至中心服务器,从而保护了数据隐私。 但是与普通的训练数据集不同,终端系统中的数据分布不平衡,这将导致 FL 的通信效率下降。 针对该问题,提出了一种基于数据分布加权聚合的 FL 算法。 通过计算参与方的本地数据集与平衡数据集之间的海林格距离对本地数据集的平衡程度进行了量化,并据此调整了参与方在聚合时的权重,以减少算法收敛或达到目标准确率所需的通信回合。提出的算法利用公开数据集进行了仿真实验。 实验结果表明,其与最新的算法联邦平均相比,通信成本降低了 14. 6% 以上,有效提升了数据不平衡时 FL 的通信效率。
Abstract:
Federated learning ( FL) is a distributed machine learning method that aggregates machine learning models trained locally by various mobile terminals through a central server,so that multiple participants can collaborate in high-efficiency machine learning. At the same time,FL does not need to send the private data of the terminal to the central server,thereby protecting data privacy. But different from the ordinary training data set,the data distribution in the terminal system is not balanced,which will lead to the decrease of the communication efficiency of FL. To solve this problem,a FL algorithm based on weighted aggregation of data distribution is proposed. The balance of the local data set is quantified by calculating the Hellinger distance between the local data set of the participants and the balanced data set,and the weight of the participants during aggregation is adjusted accordingly to reduce the algorithm convergence orachieve the goal the communication round required for accuracy. The proposed algorithm uses public data sets to conduct simulation experiments. The experimental results show that compared with the latest algorithm Federated Averaging,the communication cost is reduced by more than 14. 6% ,which effectively improves the communication efficiency of FL when the data is imbalanced.

相似文献/References:

[1]陈全 赵文辉 李洁 江雨燕.选择性集成学习算法的研究[J].计算机技术与发展,2010,(02):87.
 CHEN Quan,ZHAO Wen-hui,LI Jie,et al.Research of Selective Ensemble Learning Algorithm[J].,2010,(12):87.
[2]黄秀丽 王蔚.SVM在非平衡数据集中的应用[J].计算机技术与发展,2009,(06):190.
 HUANG Xiu-li,WANG Wei.Application of SVM in Imbalances Dataset[J].,2009,(12):190.
[3]鲁晓南 接标.一种基于个性化邮件特征的反垃圾邮件系统[J].计算机技术与发展,2009,(08):155.
 LU Xiao-nan,JIE Biao.An Individual Anti- Spam Technology[J].,2009,(12):155.
[4]张苗 张德贤.多类支持向量机文本分类方法[J].计算机技术与发展,2008,(03):139.
 ZHANG Miao,ZHANG De-xian.Research on Text Categorization Based on. M- SVMs[J].,2008,(12):139.
[5]汤萍萍 王红兵.基于强化学习的Web服务组合[J].计算机技术与发展,2008,(03):142.
 TANG Ping-ping,WANG Hong-bing.Web Service Composition Based on Reinforcement -Learning[J].,2008,(12):142.
[6]杨雪洁 赵姝 张燕平.基于商空间理论的冬小麦产量预测和分析[J].计算机技术与发展,2008,(03):249.
 YANG Xue-jie,ZHAO Shu,ZHANG Yan-ping.Analysis on Winter Wheat Yield Based on Quotient Space Theory[J].,2008,(12):249.
[7]汤伟 程家兴 纪霞.一种基于概率推理的邮件过滤系统的研究与设计[J].计算机技术与发展,2008,(08):76.
 TANG Wei,CHENG Jia-xing,JI Xia.Research and Design of a Spam Filtering System Based on Probability Inference[J].,2008,(12):76.
[8]孙海虹 丁华福.基于模糊粗糙集的Web文本分类[J].计算机技术与发展,2010,(07):21.
 SUN Hai-hong,DING Hua-fu.Web Document Classification Based on Fuzzy-Rough Set[J].,2010,(12):21.
[9]汤伟 程家兴 纪霞.统计学理论在邮件分类中的应用研究[J].计算机技术与发展,2008,(12):231.
 TANG Wei,CHENG Jia-xing,JI Xia.Research and Design of a Spam Filtering System Based on Statistical Learning Theory[J].,2008,(12):231.
[10]张高胤 谭成翔 汪海航.基于K-近邻算法的网页自动分类系统的研究及实现[J].计算机技术与发展,2007,(01):21.
 ZHANG Gao-yin,TAN Cheng-xiang,WANG Hai-hang.Design and Implementation of Web Page Automation Classification System Based on K- Nearest Neighbor Algorithm[J].,2007,(12):21.
[11]王志良,何 刚*,俞文心,等.边缘场景下动态联邦学习优化方法[J].计算机技术与发展,2024,34(02):98.[doi:10. 3969 / j. issn. 1673-629X. 2024. 02. 015]
 WANG Zhi-liang,HE Gang*,YU Wen-xin,et al.Dynamic Federated Learning Optimization Method in Edge Scenarios[J].,2024,34(12):98.[doi:10. 3969 / j. issn. 1673-629X. 2024. 02. 015]

更新日期/Last Update: 2021-12-10