[1]况立群,冯 利,韩 燮,等.基于双深度 Q 网络的智能决策系统研究[J].计算机技术与发展,2022,32(02):137-142.[doi:10. 3969 / j. issn. 1673-629X. 2022. 02. 022]
 KUANG Li-qun,FENG Li,HAN Xie,et al.Research on Intelligent Decision-making System Based on Double Deep Q-Network[J].,2022,32(02):137-142.[doi:10. 3969 / j. issn. 1673-629X. 2022. 02. 022]
点击复制

基于双深度 Q 网络的智能决策系统研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
32
期数:
2022年02期
页码:
137-142
栏目:
应用前沿与综合
出版日期:
2022-02-10

文章信息/Info

Title:
Research on Intelligent Decision-making System Based on Double Deep Q-Network
文章编号:
1673-629X(2022)02-0137-06
作者:
况立群1 冯 利1 韩 燮1 贾炅昊2 郭广行3
1. 中北大学 大数据学院,山西 太原 030051;
2. 北方自动控制技术研究所,山西 太原 030006;
3. 太原师范学院 地理科学学院,山西 太原 030006
Author(s):
KUANG Li-qun1 FENG Li1 HAN Xie1 JIA Jiong-hao2 GUO Guang-xing3
1. School of Data Science and Technology,North University of China,Taiyuan 030051,China;
2. North Automatic Control Technology Institute,Taiyuan 030006,China;
3. School of Geography Science,Taiyuan Normal University,Taiyuan 030006,China
关键词:
深度强化学习深度 Q 网络对抗演练仿真训练Unity3D
Keywords:
deep reinforcement learningdeep Q-networkconfrontation drillsimulation trainingUnity3D
分类号:
TP391. 9
DOI:
10. 3969 / j. issn. 1673-629X. 2022. 02. 022
摘要:
目前智能决策系统中的经典算法智能化程度较低,而更为先进的强化学习算法应用于复杂决策任务又会导致存储上的维度灾难问题。 针对该问题,提出了一种基于双深度 Q 网络的智能决策算法,改进了目标 Q 值计算方法,并将动作选择和策略评估分开进行,从而获得更加稳定有效的策略。 智能体对输入状态进行训练,输出一个较优的动作来驱动智能体行为,包括环境感知、动作感知及任务协同等,继而在复杂度较高的决策环境中顺利完成给定任务。 基于 Unity3D 游戏引擎开发了虚拟智能对抗演练的验证系统,对演练实时状态和智能体训练结果进行可视化,验证了双深度 Q 网络模型的正确性和稳定性,有效解决了强化学习算法存在的灾难问题。 该智能决策算法有望在策略游戏、对抗演练、任务方案评估等领域发挥作用。
Abstract:
At present, the classical algorithms in intelligent decision - making systems have a lower degree of intelligence, and theapplication of more advanced reinforcement learning algorithms to complex decision-making tasks will lead to dimensional disasters onstorage. Aiming at this problem, an intelligent decision-making algorithm based on double depth Q-network is proposed. Thecalculation method of target Q value is improved,and the action selection and strategy evaluation are carried out separately,so as to obtainmore stable and effective strategies. The agent trains the input state and outputs a better action to drive the agent behavior,including environment perception,action perception and task coordination,and then successfully completes the given task in a more complex decision-making environment. Based on Unity3D game engine,a verification system for virtual intelligent confrontation drill is developed,whichvisualizes the real-time states of the drill and the agent training results,verifies the correctness and stability of the double deep Q-networkmodel,and effectively solves the disaster problem of reinforcement learning algorithms. The intelligent decision algorithm proposed isexpected to play a role in strategy games,confrontation drills,mission plan evaluations and other fields.

相似文献/References:

[1]赵 纯,董小明.基于深度 Q-Learning 的信号灯配时优化研究[J].计算机技术与发展,2021,31(08):198.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 034]
 ZHAO Chun,DONG Xiao-ming.Research on Signal Timing Optimization Based on Deep Q-Learning[J].,2021,31(02):198.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 034]
[2]高文斌,王 睿,王田丰,等.基于深度强化学习的 QoS 感知 Web 服务组合[J].计算机技术与发展,2022,32(06):92.[doi:10. 3969 / j. issn. 1673-629X. 2022. 06. 016]
 GAO Wen-bin,WANG Rui,WANG Tian-feng,et al.QoS-aware Service Composition Based on Deep Reinforcement Learning[J].,2022,32(02):92.[doi:10. 3969 / j. issn. 1673-629X. 2022. 06. 016]
[3]詹 御,张郭健,彭麟杰,等.基于 DRL 的 MEC 卸载网络竞争窗口优化[J].计算机技术与发展,2022,32(06):99.[doi:10. 3969 / j. issn. 1673-629X. 2022. 06. 017]
 ZHAN Yu,ZHANG Guo-jian,PENG Lin-jie,et al.Optimization of Contention Window of MEC Offloading Network Based on DRL[J].,2022,32(02):99.[doi:10. 3969 / j. issn. 1673-629X. 2022. 06. 017]
[4]牟轩庭,张宏军,廖湘琳,等.规则引导的智能体决策框架[J].计算机技术与发展,2022,32(10):156.[doi:10. 3969 / j. issn. 1673-629X. 2022. 10. 026]
 MU Xuan-ting,ZHANG Hong-jun,LIAO Xiang-lin,et al.Rule-guided Agent Decision-Making Framework[J].,2022,32(02):156.[doi:10. 3969 / j. issn. 1673-629X. 2022. 10. 026]
[5]林泽阳,赖 俊,陈希亮.基于课程学习的深度强化学习研究综述[J].计算机技术与发展,2022,32(11):16.[doi:10. 3969 / j. issn. 1673-629X. 2022. 11. 003]
 LIN Ze-yang,LAI Jun,CHEN Xi-liang.An Overview of Deep Reinforcement Learning Based on Curriculum Learning[J].,2022,32(02):16.[doi:10. 3969 / j. issn. 1673-629X. 2022. 11. 003]
[6]过萌竹,孙 君.基于强化学习的 D3QN 拥塞控制算法[J].计算机技术与发展,2023,33(02):105.[doi:10. 3969 / j. issn. 1673-629X. 2023. 02. 016]
 GUO Meng-zhu,SUN Jun.Congestion Control Algorithm for D3QN Based on Reinforcement Learning[J].,2023,33(02):105.[doi:10. 3969 / j. issn. 1673-629X. 2023. 02. 016]
[7]吕相霖,臧兆祥,李思博,等.基于注意力的循环 PPO 算法及其应用[J].计算机技术与发展,2024,34(01):136.[doi:10. 3969 / j. issn. 1673-629X. 2024. 01. 020]
 LYU Xiang-lin,ZANG Zhao-xiang,LI Si-bo,et al.Attention-based Recurrent PPO Algorithm and Its Application[J].,2024,34(02):136.[doi:10. 3969 / j. issn. 1673-629X. 2024. 01. 020]
[8]龚亮亮,张 影,张俊尧,等.基于深度强化学习的任务卸载和资源分配优化[J].计算机技术与发展,2024,34(04):116.[doi:10. 3969 / j. issn. 1673-629X. 2024. 04. 018]
 GONG Liang-liang,ZHANG Ying,ZHANG Jun-yao,et al.Joint Optimization of Task Offloading and Resource Allocation Based on Deep Reinforcement Learning[J].,2024,34(02):116.[doi:10. 3969 / j. issn. 1673-629X. 2024. 04. 018]

更新日期/Last Update: 2022-02-10