一种改进强化学习算法的路径规划方法-《计算机技术与发展》

文章信息/Info

Title:: A Path Planning Method Based on Improved Reinforcement Learning Algorithm

作者:: 陈松1; 沈苏彬2; 1. 南京邮电大学计算机学院,江苏南京 210023;
2. 南京邮电大学通信与网络技术国家工程研究中心,江苏南京 210003

Author(s):: CHEN Song1; SHEN Su-bin2; 1. School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China;
2. National Engineering Research Center on Communication and Networking,Nanjing University of Posts and Telecommunications,Nanjing 210003,China

Keywords:: Q-learning algorithm; causal model; causal inference; confounding factor; backdoor adjustment

摘要:: 提升 Q 学习(Q-learning)算法在复杂环境中的数据效率与决策准确度,无疑是算法性能优化所面临的关键挑战。将因果模型引入 Q 学习算法,通过揭示变量间的因果关系,从而提高 Q 学习算法的性能是新兴且热门的研究方向。该文提出一种基于因果模型的 Q 学习算法,C-Q 学习(Causal-model based Q-learning)算法。该算法包括基于智能体利用 Q学习算法与环境交互过程中关键变量之间的因果关系,构建结构因果模型;采用因果推断理论中的后门调整的方法去除模型中影响奖励的混淆因子所引起的混淆效应,评估了更为准确的 Q 值,并且精准识别出每个状态下可能获得最高奖励的动作,优化 Q 学习算法的动作选择过程。最后,将 Q 学习算法、Eva-Q 学习算法、C-Q 学习算法在栅格环境中进行仿真实验。仿真实验结果表明,C-Q 学习算法在路径长度、规划时间、数据效率和决策准确度等多个指标上均优于其余两种算法。

Abstract:: Improving the data efficiency and decision accuracy of Q-learning algorithms in complex environments is undoubtedly a key challenge for algorithm performance optimization. Introducing causal model into Q-learning algorithms and improving the performance of Q- learning algorithms by revealing the causal relationship between variables is an emerging and popular research direction. We propose a Q-learning algorithm based on a causal model,the C-Q learning (Causal-model based Q-learning) algorithm. The algorithm includes building a structural causal model based on the causal relationship between key variables in the process of robot using Q-learning algorithm and environment interaction; using the backdoor adjustment method in causal inference theory to remove the confusion effect caused by the confounding factor affecting the reward in the model,evaluating a more accurate Q value,and accurately identifying the action that may obtain the highest reward in each state,optimizing the action selection process of the Q-learning algorithm. Finally,the Q-learning algorithm,Eva-Q learning algorithm,and C-Q learning algorithm were simulated in a grid environment. The simulation results show that the C-Q learning algorithm is superior to the other two algorithms in multiple indicators such as path length,planning time,data efficiency,and decision accuracy.