«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j. issn. 1673-629X. 2022. 10. 026]
点击复制

规则引导的智能体决策框架()

分享到：

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 32
期数:: 2022年10期

页码:: 156-163

栏目:: 人工智能

出版日期:: 2022-10-10

文章信息/Info

Title:: Rule-guided Agent Decision-Making Framework

文章编号:: 1673-629X(2022)10-0156-08

作者:: 牟轩庭; 张宏军; 廖湘琳; 章乐贵; 陆军工程大学指挥控制工程学院,江苏南京 210000

Author(s):: MU Xuan-ting; ZHANG Hong-jun; LIAO Xiang-lin; ZHANG Le-gui; School of Command and Control Engineering,Army Engineering University of PLA,Nanjing 210000,China

关键词:: 深度强化学习; 专家经验; 规则; 动作空间; 近端策略优化算法; 注意力机制

Keywords:: deep reinforcement learning; expert experience; rule; action space; proximal policy optimization algorithm; attention mechanism

分类号:: TP391

DOI:: 10. 3969 / j. issn. 1673-629X. 2022. 10. 026

摘要:: 虽然近年来深度强化学习在决策智能中取得突破,但复杂场景中的巨大动作空间仍然是算法成功学习的一大挑战。导致这一问题的主要原因在于缺乏指导的智能体难以累积足够的成功经验,样本数据质量低下,影响模型正确收敛,而加入人类知识进行辅助是一种有效的方法。为此提出了规则引导的智能体决策框架,介绍了决策框架的总体组成;针对不同态势下存在的无效动作导致探索困难的问题,提出了规则引导的智能体决策方法,选择近端策略优化算法和注意力机制构建了简单的智能体网络,利用专家经验设计规则引导层,根据态势特征对智能体的动作空间进行动态约束。实验结果表明:该方法提高了智能体在星际争霸 II 小型任务“ 训练陆战队员” 中的成绩,并且去掉规则引导层后仍然能够保持部分性能。

Abstract:: Although deep reinforcement learning has made breakthrough in intelligent decision- making in recent years,the large action space in complex scenes is still a big challenge for successful learning. The main reason for this problem is that it is difficult for the agent without guidance to accumulate enough successful experience,which leads to the low-quality sample data and prevents the model from converging correctly. However,adding human knowledge can help solve this problem. For this reason,a rule-guided agent decision -making framework is proposed,and the overall composition of the decision-making framework is introduced. In order to solve the hard exploration problem due to invalid actions in different situations,a rule-guided agent decision-making method is proposed,which chooses proximal policy optimization algorithm and attention mechanism to build a simple agent network and uses expert experience to design a rule guidance layer thus dynamically constraining the action space of the agent according to the situational features. Experimental results show that the proposed method improves the agent’ s performance in StarCraft II minigame " BuildMarines" ,and the agent is able to maintain part of its performance after removing the rule guidance layer.

相似文献/References:

[1]赵纯,董小明.基于深度 Q-Learning 的信号灯配时优化研究[J].计算机技术与发展,2021,31(08):198.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 034]
　ZHAO Chun,DONG Xiao-ming.Research on Signal Timing Optimization Based on Deep Q-Learning[J].,2021,31(10):198.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 034]
[2]况立群,冯利,韩燮,等.基于双深度 Q 网络的智能决策系统研究[J].计算机技术与发展,2022,32(02):137.[doi:10. 3969 / j. issn. 1673-629X. 2022. 02. 022]
　KUANG Li-qun,FENG Li,HAN Xie,et al.Research on Intelligent Decision-making System Based on Double Deep Q-Network[J].,2022,32(10):137.[doi:10. 3969 / j. issn. 1673-629X. 2022. 02. 022]
[3]高文斌,王睿,王田丰,等.基于深度强化学习的 QoS 感知 Web 服务组合[J].计算机技术与发展,2022,32(06):92.[doi:10. 3969 / j. issn. 1673-629X. 2022. 06. 016]
　GAO Wen-bin,WANG Rui,WANG Tian-feng,et al.QoS-aware Service Composition Based on Deep Reinforcement Learning[J].,2022,32(10):92.[doi:10. 3969 / j. issn. 1673-629X. 2022. 06. 016]
[4]詹御,张郭健,彭麟杰,等.基于 DRL 的 MEC 卸载网络竞争窗口优化[J].计算机技术与发展,2022,32(06):99.[doi:10. 3969 / j. issn. 1673-629X. 2022. 06. 017]
　ZHAN Yu,ZHANG Guo-jian,PENG Lin-jie,et al.Optimization of Contention Window of MEC Offloading Network Based on DRL[J].,2022,32(10):99.[doi:10. 3969 / j. issn. 1673-629X. 2022. 06. 017]
[5]林泽阳,赖俊,陈希亮.基于课程学习的深度强化学习研究综述[J].计算机技术与发展,2022,32(11):16.[doi:10. 3969 / j. issn. 1673-629X. 2022. 11. 003]
　LIN Ze-yang,LAI Jun,CHEN Xi-liang.An Overview of Deep Reinforcement Learning Based on Curriculum Learning[J].,2022,32(10):16.[doi:10. 3969 / j. issn. 1673-629X. 2022. 11. 003]
[6]吕相霖,臧兆祥,李思博,等.基于注意力的循环 PPO 算法及其应用[J].计算机技术与发展,2024,34(01):136.[doi:10. 3969 / j. issn. 1673-629X. 2024. 01. 020]
　LYU Xiang-lin,ZANG Zhao-xiang,LI Si-bo,et al.Attention-based Recurrent PPO Algorithm and Its Application[J].,2024,34(10):136.[doi:10. 3969 / j. issn. 1673-629X. 2024. 01. 020]
[7]龚亮亮,张影,张俊尧,等.基于深度强化学习的任务卸载和资源分配优化[J].计算机技术与发展,2024,34(04):116.[doi:10. 3969 / j. issn. 1673-629X. 2024. 04. 018]
　GONG Liang-liang,ZHANG Ying,ZHANG Jun-yao,et al.Joint Optimization of Task Offloading and Resource Allocation Based on Deep Reinforcement Learning[J].,2024,34(10):116.[doi:10. 3969 / j. issn. 1673-629X. 2024. 04. 018]

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed895
全文下载/Downloads457
评论/Comments