[1]陈松,沈苏彬.一种改进强化学习算法的路径规划方法[J].计算机技术与发展,2025,(02):115-121.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0308]
 CHEN Song,SHEN Su-bin.A Path Planning Method Based on Improved Reinforcement Learning Algorithm[J].,2025,(02):115-121.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0308]
点击复制

一种改进强化学习算法的路径规划方法()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2025年02期
页码:
115-121
栏目:
人工智能
出版日期:
2025-02-10

文章信息/Info

Title:
A Path Planning Method Based on Improved Reinforcement Learning Algorithm
文章编号:
1673-629X(2025)02-0115-07
作者:
陈松1沈苏彬2
1. 南京邮电大学 计算机学院,江苏 南京 210023;
2. 南京邮电大学 通信与网络技术国家工程研究中心,江苏 南京 210003
Author(s):
CHEN Song1SHEN Su-bin2
1. School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China;
2. National Engineering Research Center on Communication and Networking,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
关键词:
Q学习算法因果模型因果推断混淆因子后门调整
Keywords:
Q-learning algorithmcausal modelcausal inferenceconfounding factorbackdoor adjustment
分类号:
TP301.6
DOI:
10.20165/j.cnki.ISSN1673-629X.2024.0308
摘要:
提升 Q 学习(Q-learning)算法在复杂环境中的数据效率与决策准确度,无疑是算法性能优化所面临的关键挑战。将因果模型引入 Q 学习算法,通过揭示变量间的因果关系,从而提高 Q 学习算法的性能是新兴且热门的研究方向。 该文提出一种基于因果模型的 Q 学习算法,C-Q 学习(Causal-model based Q-learning)算法。 该算法包括基于智能体利用 Q学习算法与环境交互过程中关键变量之间的因果关系,构建结构因果模型;采用因果推断理论中的后门调整的方法去除模型中影响奖励的混淆因子所引起的混淆效应,评估了更为准确的 Q 值,并且精准识别出每个状态下可能获得最高奖励的动作,优化 Q 学习算法的动作选择过程。 最后,将 Q 学习算法、Eva-Q 学习算法、C-Q 学习算法在栅格环境中进行仿真实验。 仿真实验结果表明,C-Q 学习算法在路径长度、规划时间、数据效率和决策准确度等多个指标上均优于其余两种算法。
Abstract:
Improving the data efficiency and decision accuracy of Q-learning algorithms in complex environments is undoubtedly a key challenge for algorithm performance optimization. Introducing causal model into Q-learning algorithms and improving the performance of Q- learning algorithms by revealing the causal relationship between variables is an emerging and popular research direction. We propose a Q-learning algorithm based on a causal model,the C-Q learning (Causal-model based Q-learning) algorithm. The algorithm includes building a structural causal model based on the causal relationship between key variables in the process of robot using Q-learning algorithm and environment interaction; using the backdoor adjustment method in causal inference theory to remove the confusion effect caused by the confounding factor affecting the reward in the model,evaluating a more accurate Q value,and accurately identifying the action that may obtain the highest reward in each state,optimizing the action selection process of the Q-learning algorithm. Finally,the Q-learning algorithm,Eva-Q learning algorithm,and C-Q learning algorithm were simulated in a grid environment. The simulation results show that the C-Q learning algorithm is superior to the other two algorithms in multiple indicators such as path length,planning time,data efficiency,and decision accuracy.

相似文献/References:

[1]杜艾芊,赵海涛,刘南杰. 车载通信中基于Q学习的信道接入技术研究[J].计算机技术与发展,2017,27(03):85.
 DU Ai-qian,ZHAO Hai-tao,LIU Nan-jie. Research on Technology of Channel Access Based on Q-Learning Algorithm for Vehicular Communication[J].,2017,27(02):85.

更新日期/Last Update: 2025-02-10