«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j. issn. 1673-629X. 2021. 12. 006]
点击复制

在不平衡数据中进行高效通信的联邦学习()

分享到：

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 31
期数:: 2021年12期

页码:: 33-38

栏目:: 大数据分析与挖掘

出版日期:: 2021-12-10

文章信息/Info

Title:: Communication-efficient Federated Learning from Imbalanced Data

文章编号:: 1673-629X(2021)12-0033-06

作者:: 舒志鸿¹ ; 沈苏彬²; 1. 南京邮电大学计算机学院,江苏南京 210046;
2. 南京邮电大学通信与网络技术国家工程研究中心,江苏南京 210046

Author(s):: SHU Zhi-hong¹; SHEN Su-bin²; 1. School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210046,China;
2. National Engineering Research Center on Communication and Networking,Nanjing University of Posts and Telecommunications,Nanjing 210046,China

关键词:: 联邦学习; 机器学习; 不平衡数据; 海林格距离; 聚合

Keywords:: federated learning; machine learning; imbalanced data; Hellinger distance; aggregation

分类号:: TP181

DOI:: 10. 3969 / j. issn. 1673-629X. 2021. 12. 006

摘要:: 联邦学习( FL)是一种分布式的机器学习方法,它通过中心服务器汇总各个移动终端在本地训练的机器学习模型,使得多个参与方能够协作进行高效率的机器学习。同时,FL 不需要将终端的私人数据发送至中心服务器,从而保护了数据隐私。但是与普通的训练数据集不同,终端系统中的数据分布不平衡,这将导致 FL 的通信效率下降。针对该问题,提出了一种基于数据分布加权聚合的 FL 算法。通过计算参与方的本地数据集与平衡数据集之间的海林格距离对本地数据集的平衡程度进行了量化,并据此调整了参与方在聚合时的权重,以减少算法收敛或达到目标准确率所需的通信回合。提出的算法利用公开数据集进行了仿真实验。实验结果表明,其与最新的算法联邦平均相比,通信成本降低了 14. 6% 以上,有效提升了数据不平衡时 FL 的通信效率。

Abstract:: Federated learning ( FL) is a distributed machine learning method that aggregates machine learning models trained locally by various mobile terminals through a central server,so that multiple participants can collaborate in high-efficiency machine learning. At the same time,FL does not need to send the private data of the terminal to the central server,thereby protecting data privacy. But different from the ordinary training data set,the data distribution in the terminal system is not balanced,which will lead to the decrease of the communication efficiency of FL. To solve this problem,a FL algorithm based on weighted aggregation of data distribution is proposed. The balance of the local data set is quantified by calculating the Hellinger distance between the local data set of the participants and the balanced data set,and the weight of the participants during aggregation is adjusted accordingly to reduce the algorithm convergence orachieve the goal the communication round required for accuracy. The proposed algorithm uses public data sets to conduct simulation experiments. The experimental results show that compared with the latest algorithm Federated Averaging,the communication cost is reduced by more than 14. 6% ,which effectively improves the communication efficiency of FL when the data is imbalanced.

相似文献/References:

[1]陈全赵文辉李洁江雨燕.选择性集成学习算法的研究[J].计算机技术与发展,2010,(02):87.
　CHEN Quan,ZHAO Wen-hui,LI Jie,et al.Research of Selective Ensemble Learning Algorithm[J].,2010,(12):87.
[2]黄秀丽王蔚.SVM在非平衡数据集中的应用[J].计算机技术与发展,2009,(06):190.
　HUANG Xiu-li,WANG Wei.Application of SVM in Imbalances Dataset[J].,2009,(12):190.
[3]鲁晓南接标.一种基于个性化邮件特征的反垃圾邮件系统[J].计算机技术与发展,2009,(08):155.
　LU Xiao-nan,JIE Biao.An Individual Anti- Spam Technology[J].,2009,(12):155.
[4]张苗张德贤.多类支持向量机文本分类方法[J].计算机技术与发展,2008,(03):139.
　ZHANG Miao,ZHANG De-xian.Research on Text Categorization Based on. M- SVMs[J].,2008,(12):139.
[5]汤萍萍王红兵.基于强化学习的Web服务组合[J].计算机技术与发展,2008,(03):142.
　TANG Ping-ping,WANG Hong-bing.Web Service Composition Based on Reinforcement -Learning[J].,2008,(12):142.
[6]杨雪洁赵姝张燕平.基于商空间理论的冬小麦产量预测和分析[J].计算机技术与发展,2008,(03):249.
　YANG Xue-jie,ZHAO Shu,ZHANG Yan-ping.Analysis on Winter Wheat Yield Based on Quotient Space Theory[J].,2008,(12):249.
[7]汤伟程家兴纪霞.一种基于概率推理的邮件过滤系统的研究与设计[J].计算机技术与发展,2008,(08):76.
　TANG Wei,CHENG Jia-xing,JI Xia.Research and Design of a Spam Filtering System Based on Probability Inference[J].,2008,(12):76.
[8]孙海虹丁华福.基于模糊粗糙集的Web文本分类[J].计算机技术与发展,2010,(07):21.
　SUN Hai-hong,DING Hua-fu.Web Document Classification Based on Fuzzy-Rough Set[J].,2010,(12):21.
[9]汤伟程家兴纪霞.统计学理论在邮件分类中的应用研究[J].计算机技术与发展,2008,(12):231.
　TANG Wei,CHENG Jia-xing,JI Xia.Research and Design of a Spam Filtering System Based on Statistical Learning Theory[J].,2008,(12):231.
[10]张高胤谭成翔汪海航.基于K-近邻算法的网页自动分类系统的研究及实现[J].计算机技术与发展,2007,(01):21.
　ZHANG Gao-yin,TAN Cheng-xiang,WANG Hai-hang.Design and Implementation of Web Page Automation Classification System Based on K- Nearest Neighbor Algorithm[J].,2007,(12):21.
[11]王志良,何刚*,俞文心,等.边缘场景下动态联邦学习优化方法[J].计算机技术与发展,2024,34(02):98.[doi:10. 3969 / j. issn. 1673-629X. 2024. 02. 015]
　WANG Zhi-liang,HE Gang*,YU Wen-xin,et al.Dynamic Federated Learning Optimization Method in Edge Scenarios[J].,2024,34(12):98.[doi:10. 3969 / j. issn. 1673-629X. 2024. 02. 015]

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed1074
全文下载/Downloads766
评论/Comments