多智能体强化学习博弈训练方式研究综述-《计算机技术与发展》

文章信息/Info

Title:: Overview on Game Training Methods of Multi-agent Reinforcement Learning

Author(s):: ZHANG Ren-wen; LAI Jun* ; CHEN Xi-liang; School of Command and Control Engineering,Army Engineering University of PLA,Nanjing 210007,China

Keywords:: multi-agent reinforcement learning; game training method; self-play; population based training; league trainin

摘要:: 博弈训练是多智能体强化学习的一种训练方式,是当前多智能体强化学习研究的新兴领域。它采用基于自我博弈的基本方式对智能体进行训练,即在对抗性环境中,通过采取不同的对手抽样方式,智能体与自身及自身历史版本进行对抗,从而获得经验,使智能体在自我对战中不断成长,已在围棋、扑克、即时战略游戏等多种典型对抗性场景中获得了较好运用。首先,对多智能体强化学习博弈训练的基本理论进行简要的概念介绍,回顾了博弈求解方式的发展历程;然后,根据基础原理的不同,将博弈训练方式分为基于自我博弈的训练和基于博弈论的训练,并按分类重点介绍了经典自我博弈训练、成长式自我博弈训练、种群训练、联盟训练和策略空间响应预言机等现有典型算法,列举了实际应用;随后,对目前博弈训练方式存在的关键问题与挑战进行了阐述,简要介绍当前的典型测试框架及平台;最后,对博弈训练方式的发展前景进行了展望。

Abstract:: Game training is a training method for multi-agent reinforcement learning,which is an emerging area in current research onmulti-agent reinforcement learning. It adopts the basic approach of training the agents based on self-play,i. e. ,in an adversarial environment,the agents play against themselves and their own historical versions to gain experience and make the agents grow in self-battle bytaking a variety of different adversary sampling methods,which has been well used in many typical adversarial scenarios such as Go,poker,and real - time strategy games. Firstly, the basic theory of multi - agent reinforcement learning is briefly introduced, and thedevelopment of the game solving method is reviewed. Then the game training methods are classified into self-play based training andgame theory based training according to the different underlying principles,and existing typical algorithms such as naive self-play,matureself-play,population based training self - play,league training and policy space response oracle are introduced by classification,citingpractical applications. Next the key problems and challenges of game training methods are described,and the current typical testing frameworks and platforms are briefly listed. Finally,the development prospect of game training methods of multi-agent reinforcement learningis prospected.