[1]雷 莹,许道云.一种合作 Markov 决策系统[J].计算机技术与发展,2020,30(12):8-14.[doi:10. 3969 / j. issn. 1673-629X. 2020. 12. 002]
 LEI Ying,XU Dao-yun.A Cooperation Markov Decision Process System[J].,2020,30(12):8-14.[doi:10. 3969 / j. issn. 1673-629X. 2020. 12. 002]
点击复制

一种合作 Markov 决策系统()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
30
期数:
2020年12期
页码:
8-14
栏目:
智能、算法、系统工程
出版日期:
2020-12-10

文章信息/Info

Title:
A Cooperation Markov Decision Process System
文章编号:
1673-629X(2020)12-0008-07
作者:
雷 莹许道云
贵州大学 计算机科学与技术学院,贵州 贵阳 550025
Author(s):
LEI YingXU Dao-yun
School of Computer Science and Technology,Guizhou University,Guiyang 550025,China
关键词:
强化学习智能体联合 Markov 决策过程最优策略对算法
Keywords:
reinforcement learningagentcooperation Markov decision processoptimal pair of strategiesalgorithm
分类号:
TP301
DOI:
10. 3969 / j. issn. 1673-629X. 2020. 12. 002
摘要:
在机器学习中,强化学习是一个重要的研究领域。 Markov 决策过程(MDP) 是强化学习的重要基础,在一般的Markov 决策系统中,只考虑一个智能体的学习演化。 但目前诸多问题中只考虑单个智能体的学习演化有一定的局限性,越来越多的应用中都涉及到多个智能体。 进而引入一种带有两个智能体的联合 Markov 决策系统(CMDP),该系统适用于两个智能体之间合作决策的学习演化。 智能体之间存在合作或博弈两种类型,文中重点研究合作类型的 CMDP,在此类学习模型中,智能体交替执行行为,以社会价值作为求优准则,寻找最优策略对 (π*0 ,π*1 ),共同完成目标任务。 进一步给出了在联合 Markov 系统中寻找最优策略对的算法,其根本任务是寻找一个最优策略对 (π*0 ,π*1 ) ,形成一个合作系统CMDP (π*0 ,π*1 ),且系统模型可以进一步扩充到多个智能体的联合决策系统。
Abstract:
Reinforcement learning is an important research area in machine learning. Then Markov decision process (MDP) is the important basics in reinforcement learning. In the usual Markov decision system,only one agent’s learning evolution is considered. Among many problems,only the learning evolution of a single agent is considered,which has certain limitations,but the actual application involves multiple agents. For the reason, a cooperation Markov decision process (CMDP) with two agents is introduced, which is suitable for the learning evolution of cooperation decision between two agents. The focuses of the research is the cooperative CMDP. In this kind of learning model,the agent alternately performs behaviors,seeks the optimal pair of strategies (π*0 ,π*1 ) with social value optimization criteria and accomplishes the target tasks together. Researching the algorithm for finding the optimal strategy pair (π*0 ,π*1) ,which is to find an optimal strategy pair and form an evolutionary system CMDP(π*0,π*1). In addition,this system model can also be extended to the joint decision system of multiple agents.

相似文献/References:

[1]李龙澍 葛瑞峰 王慧萍.基于神经网络的批强化学习在Robocup中的应用[J].计算机技术与发展,2009,(07):98.
 LI Long-shu,GE Rui-feng,WANG Hui-ping.Application of Batch Reinforcement Learning Based on NN to Robocup[J].,2009,(12):98.
[2]冯林 李琛 孙焘.Robocup半场防守中的一种强化学习算法[J].计算机技术与发展,2008,(01):59.
 FENG Lin,LI Chen,SUN Tao.A Reinforcement Learning Method for Robocup Soccer Half Field Defense[J].,2008,(12):59.
[3]马勇 李龙澍 李学俊.基于动态目标驱动的RoboCup进攻策略的研究[J].计算机技术与发展,2008,(01):84.
 MA Yong,LI Long-shu,LI Xue-jun.Research about Offensive Strategy Based on Dynamic Goal- Driven in RoboCup[J].,2008,(12):84.
[4]于东超 耿祥义 刘泮青.5vs5仿真机器人足球比赛——防守算法研究[J].计算机技术与发展,2008,(02):59.
 YU Dong-chao,GENG Xiang-yi,LIU Pan-qing.5vs5 Simulation Robot Soccer Competition: Defence Algorithm Research[J].,2008,(12):59.
[5]汤萍萍 王红兵.基于强化学习的Web服务组合[J].计算机技术与发展,2008,(03):142.
 TANG Ping-ping,WANG Hong-bing.Web Service Composition Based on Reinforcement -Learning[J].,2008,(12):142.
[6]周勇 刘锋.基于改进的Q学习的RoboCup传球策略研究[J].计算机技术与发展,2008,(04):63.
 ZHOU Yong,LIU Feng.Research of RoboCup Pass Strategy Based on Improved Q- Learning[J].,2008,(12):63.
[7]王朝晖 孙惠萍.图像检索中IRRL模型研究[J].计算机技术与发展,2008,(12):35.
 WANG Zhao-hui,SUN Hui-ping.Research of IRRL Model in Image Retrieval[J].,2008,(12):35.
[8]马勇 李龙澍 李学俊.基于Q学习的Agent智能防守策略研究与应用[J].计算机技术与发展,2008,(12):106.
 MA Yong,LI Long-shu,LI Xue-jun.Research and Application about Defensive Strategy Based on Q Learning[J].,2008,(12):106.
[9]朱志强 王建元 王芳.基于Agent的核心计算机操作机制研究[J].计算机技术与发展,2007,(07):8.
 ZHU Zhi-qiang,WANG Jian-yuan,WANG Fang.Agent - Based Research of Operational Mechanism for Core Computer[J].,2007,(12):8.
[10]林联明 王浩 王一雄.基于神经网络的Sarsa强化学习算法[J].计算机技术与发展,2006,(01):30.
 LIN Lian-ming,WANG Hao,WANG Yi-xiong.Sarsa Reinforcement Learning Algorithm Based on Neural Networks[J].,2006,(12):30.

更新日期/Last Update: 2020-12-10