«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j. issn. 1673-629X. 2023. 04. 017]
点击复制

基于强化学习的多智能体泛化性研究()

分享到：

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 33
期数:: 2023年04期

页码:: 114-119

栏目:: 人工智能

出版日期:: 2023-04-10

文章信息/Info

Title:: Research on Generalization of Multi-agent Based on Reinforcement Learning

文章编号:: 1673-629X(2023)04-0114-06

作者:: 郭鑫; 王微; 青伟; 李剑; 何召锋; 北京邮电大学,北京 100088

Author(s):: GUO Xin; WANG Wei; QING Wei; LI Jian; HE Zhao-feng; Beijing University of Posts and Telecommunications,Beijing 100088,China

关键词:: 深度强化学习方法; 多智能体; 未知环境; 策略集成; 泛化性; 可扩展性

Keywords:: deep reinforcement learning; multi-agent; unknown environment; policy ensemble; generalization; scalability

分类号:: TP181

DOI:: 10. 3969 / j. issn. 1673-629X. 2023. 04. 017

摘要:: 在多智能体强化学习算法的研究中,由于训练与测试环境具有差异,如何让智能体有效地应对环境中其他智能体策略变化的情况受到研究人员的广泛关注。针对这一泛化性问题,提出基于人类偏好的多智能体角色策略集成算法,该算法同时考虑了长期回报和即时回报。这一改进使得智能体从一些具有良好长期累积回报的候选行动中选择具有最大即时回报的行动,从而让算法确定了策略更新的方向,避免过度探索和无效训练,能快速找到最优策略。此外,智能体被动态地划分为不同的角色,同角色智能体共享参数,不仅提高了效率,而且实现了多智能体算法的可扩展性。在多智能体粒子环境中与现有算法的比较表明,该算法的智能体能更好地泛化到未知环境,且收敛速度更快,能够更高效地训练出最优策略。

Abstract:: In the research of multi-agent reinforcement learning algorithm,due to the difference between training and?
testing environment,how to make agents intelligently learn to cope with the performance degradation caused?
by the change of other agents’policy in the environment has been widely concerned by researchers. To solve?
this generalization problem, human - preference based multi - agent rolepolicy ensemble is proposed, which considers the effects of long - term reward and immediate reward. This improvement enables thealgorithm to determine the direction of policy updating to avoid excessive exploration and ineffective training. In addition, agents areclassified into different roles according to their immediate rewards of historical actions. Thus the parameters are shared with the same -role agent,which improves efficiency and achieves the scalability of the multi - agent algorithm. The comparison with the existingalgorithm in the multi-agent particle environment shows that the proposed algorithm has a faster convergence speed which can effectivelytrain the optimal strategy,and its intelligence can better generalize to the unknown environment.

相似文献/References:

[1]曹雷陈希亮.作战智能体分队攻击目标分配模型研究[J].计算机技术与发展,2009,(12):150.
　CAO Lei,CHEN Xi-liang.Research on Dynamic Weapon Target Assignment in Combat Agent Unit[J].,2009,(04):150.
[2]刘丹谢益武.面向智能体的信息系统开发方法研究[J].计算机技术与发展,2006,(03):101.
　LIU Dan,XIE Yi-wu.Research on Development Methods in Agent-Oriented IS[J].,2006,(04):101.
[3]张礼华卢道华.基于多Agent的网络课程协作学习平台的构建与研究[J].计算机技术与发展,2006,(09):126.
　ZHANG Li-hua,LU Dao-hua.Construction and Study of Cooperation - Learning Platform for Network Course Based on Multi- Agent[J].,2006,(04):126.
[4]陈雪,姜玉莲*.分布式事件触发的多单摆系统一致性跟踪控制[J].计算机技术与发展,2020,30(09):177.[doi:10. 3969 / j. issn. 1673-629X. 2020. 09. 032]
　CHEN Xue,JIANG Yu-lian*.Consensus Tracking Control of Multiple Simple-pendulum Network System Based on Distributed Event Triggering Mechanism[J].,2020,30(04):177.[doi:10. 3969 / j. issn. 1673-629X. 2020. 09. 032]
[5]宋燕燕,秦军,邢艳芳,等.基于多智能体的图优化 SLAM 构建方法[J].计算机技术与发展,2020,30(11):205.[doi:10. 3969 / j. issn. 1673-629X. 2020. 11. 038]
　SONG Yan-yan,QIN Jun,XING Yan-fang,et al.An Optimizing SLAM Construction Method Based on Multi-agent[J].,2020,30(04):205.[doi:10. 3969 / j. issn. 1673-629X. 2020. 11. 038]
[6]王闯,沈苏彬.一种基于多智能体的分布式深度神经网络算法[J].计算机技术与发展,2021,31(12):45.[doi:10. 3969 / j. issn. 1673-629X. 2021. 12. 008]
　WANG Chuang,SHEN Su-bin.A Distributed Deep Neural Network Algorithm Based on Multi-agent[J].,2021,31(04):45.[doi:10. 3969 / j. issn. 1673-629X. 2021. 12. 008]
[7]张旺,侯海良.含约束和通信时滞的多智能体系统包含控制[J].计算机技术与发展,2022,32(03):34.[doi:10. 3969 / j. issn. 1673-629X. 2022. 03. 006]
　ZHANG Wang,HOU Hai-liang.Containment Control for Sampling Multi-Agent System with Input Constraints and Communication Delays[J].,2022,32(04):34.[doi:10. 3969 / j. issn. 1673-629X. 2022. 03. 006]
[8]朱辰阳,赵春晓.个人快速交通动态任务分配问题的优化研究[J].计算机技术与发展,2022,32(11):127.[doi:10. 3969 / j. issn. 1673-629X. 2022. 11. 019]
　ZHU Chen-yang,ZHAO Chun-xiao.Research on Optimization of Dynamic Task Allocation in Personal Rapid Transit[J].,2022,32(04):127.[doi:10. 3969 / j. issn. 1673-629X. 2022. 11. 019]
[9]王军,曹雷,陈希亮,等.智能博弈对抗类型及其关键技术[J].计算机技术与发展,2022,32(S2):1.[doi:10. 3969 / j. issn. 1673-629X. 2022. S2. 001]
　WANG Jun,CAO Lei,CHEN Xi-liang,et al.Types and Key Technologies of Intelligent Game Confrontations[J].,2022,32(04):1.[doi:10. 3969 / j. issn. 1673-629X. 2022. S2. 001]

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed830
全文下载/Downloads289
评论/Comments