[1]牟轩庭,张宏军,廖湘琳,等.规则引导的智能体决策框架[J].计算机技术与发展,2022,32(10):156-163.[doi:10. 3969 / j. issn. 1673-629X. 2022. 10. 026]
 MU Xuan-ting,ZHANG Hong-jun,LIAO Xiang-lin,et al.Rule-guided Agent Decision-Making Framework[J].,2022,32(10):156-163.[doi:10. 3969 / j. issn. 1673-629X. 2022. 10. 026]
点击复制

规则引导的智能体决策框架()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
32
期数:
2022年10期
页码:
156-163
栏目:
人工智能
出版日期:
2022-10-10

文章信息/Info

Title:
Rule-guided Agent Decision-Making Framework
文章编号:
1673-629X(2022)10-0156-08
作者:
牟轩庭张宏军廖湘琳章乐贵
陆军工程大学 指挥控制工程学院,江苏 南京 210000
Author(s):
MU Xuan-tingZHANG Hong-junLIAO Xiang-linZHANG Le-gui
School of Command and Control Engineering,Army Engineering University of PLA,Nanjing 210000,China
关键词:
深度强化学习专家经验规则动作空间近端策略优化算法注意力机制
Keywords:
deep reinforcement learningexpert experienceruleaction spaceproximal policy optimization algorithmattention mechanism
分类号:
TP391
DOI:
10. 3969 / j. issn. 1673-629X. 2022. 10. 026
摘要:
虽然近年来深度强化学习在决策智能中取得突破,但复杂场景中的巨大动作空间仍然是算法成功学习的一大挑战。 导致这一问题的主要原因在于缺乏指导的智能体难以累积足够的成功经验,样本数据质量低下,影响模型正确收敛,而加入人类知识进行辅助是一种有效的方法。 为此提出了规则引导的智能体决策框架,介绍了决策框架的总体组成;针对不同态势下存在的无效动作导致探索困难的问题,提出了规则引导的智能体决策方法,选择近端策略优化算法和注意力机制构建了简单的智能体网络,利用专家经验设计规则引导层,根据态势特征对智能体的动作空间进行动态约束。 实验结果表明:该方法提高了智能体在星际争霸 II 小型任务“ 训练陆战队员” 中的成绩,并且去掉规则引导层后仍然能够保持部分性能。
Abstract:
Although deep reinforcement learning has made breakthrough in intelligent decision- making in recent years,the large action space in complex scenes is still a big challenge for successful learning. The main reason for this problem is that it is difficult for the agent without guidance to accumulate enough successful experience,which leads to the low-quality sample data and prevents the model from converging correctly. However,adding human knowledge can help solve this problem. For this reason,a rule-guided agent decision -making framework is proposed,and the overall composition of the decision-making framework is introduced. In order to solve the hard exploration problem due to invalid actions in different situations,a rule-guided agent decision-making method is proposed,which chooses proximal policy optimization algorithm and attention mechanism to build a simple agent network and uses expert experience to design a rule guidance layer thus dynamically constraining the action space of the agent according to the situational features. Experimental results show that the proposed method improves the agent’ s performance in StarCraft II minigame " BuildMarines" ,and the agent is able to maintain part of its performance after removing the rule guidance layer.

相似文献/References:

[1]赵 纯,董小明.基于深度 Q-Learning 的信号灯配时优化研究[J].计算机技术与发展,2021,31(08):198.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 034]
 ZHAO Chun,DONG Xiao-ming.Research on Signal Timing Optimization Based on Deep Q-Learning[J].,2021,31(10):198.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 034]
[2]况立群,冯 利,韩 燮,等.基于双深度 Q 网络的智能决策系统研究[J].计算机技术与发展,2022,32(02):137.[doi:10. 3969 / j. issn. 1673-629X. 2022. 02. 022]
 KUANG Li-qun,FENG Li,HAN Xie,et al.Research on Intelligent Decision-making System Based on Double Deep Q-Network[J].,2022,32(10):137.[doi:10. 3969 / j. issn. 1673-629X. 2022. 02. 022]
[3]高文斌,王 睿,王田丰,等.基于深度强化学习的 QoS 感知 Web 服务组合[J].计算机技术与发展,2022,32(06):92.[doi:10. 3969 / j. issn. 1673-629X. 2022. 06. 016]
 GAO Wen-bin,WANG Rui,WANG Tian-feng,et al.QoS-aware Service Composition Based on Deep Reinforcement Learning[J].,2022,32(10):92.[doi:10. 3969 / j. issn. 1673-629X. 2022. 06. 016]
[4]詹 御,张郭健,彭麟杰,等.基于 DRL 的 MEC 卸载网络竞争窗口优化[J].计算机技术与发展,2022,32(06):99.[doi:10. 3969 / j. issn. 1673-629X. 2022. 06. 017]
 ZHAN Yu,ZHANG Guo-jian,PENG Lin-jie,et al.Optimization of Contention Window of MEC Offloading Network Based on DRL[J].,2022,32(10):99.[doi:10. 3969 / j. issn. 1673-629X. 2022. 06. 017]
[5]林泽阳,赖 俊,陈希亮.基于课程学习的深度强化学习研究综述[J].计算机技术与发展,2022,32(11):16.[doi:10. 3969 / j. issn. 1673-629X. 2022. 11. 003]
 LIN Ze-yang,LAI Jun,CHEN Xi-liang.An Overview of Deep Reinforcement Learning Based on Curriculum Learning[J].,2022,32(10):16.[doi:10. 3969 / j. issn. 1673-629X. 2022. 11. 003]
[6]吕相霖,臧兆祥,李思博,等.基于注意力的循环 PPO 算法及其应用[J].计算机技术与发展,2024,34(01):136.[doi:10. 3969 / j. issn. 1673-629X. 2024. 01. 020]
 LYU Xiang-lin,ZANG Zhao-xiang,LI Si-bo,et al.Attention-based Recurrent PPO Algorithm and Its Application[J].,2024,34(10):136.[doi:10. 3969 / j. issn. 1673-629X. 2024. 01. 020]
[7]龚亮亮,张 影,张俊尧,等.基于深度强化学习的任务卸载和资源分配优化[J].计算机技术与发展,2024,34(04):116.[doi:10. 3969 / j. issn. 1673-629X. 2024. 04. 018]
 GONG Liang-liang,ZHANG Ying,ZHANG Jun-yao,et al.Joint Optimization of Task Offloading and Resource Allocation Based on Deep Reinforcement Learning[J].,2024,34(10):116.[doi:10. 3969 / j. issn. 1673-629X. 2024. 04. 018]

更新日期/Last Update: 2022-10-10