«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j. issn.1673-629X.2018.11.011]
点击复制

基于参数融合的 Q 学习交通信号控制方法()

分享到：

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 28
期数:: 2018年11期

页码:: 48-51

栏目:: 智能、算法、系统工程

出版日期:: 2018-11-10

文章信息/Info

Title:: A Control Method of Traffic Signals Based on Parameter Fusion of Q-learning

文章编号:: 1673-629X(2018)11-0048-04

作者:: 刘成健; 罗杰; 南京邮电大学自动化学院,江苏南京 210000

Author(s):: LIU Cheng-jian; LUO Jie; School of Automation,Nanjing University of Posts and Telecommunications,Nanjing 210000,China

关键词:: 交叉口; Q 学习; 模糊逻辑; 参数融合; 相位配时

Keywords:: intersection; Q-learning; fuzzy logic; parameters fusion; phase signal plan

分类号:: TP301

DOI:: 10.3969/ j. issn.1673-629X.2018.11.011

文献标志码:: A

摘要:: 传统 Q 学习交通控制方法经常因城市道路交通流的随机性和不确定性而产生维数问题,导致控制系统的学习效率和响应速度降低。针对该问题进行分析,提出了一种基于 Q 学习的改进路口交通信号自适应控制方法。其中,采用模糊技术与 Q 学习算法相结合的控制策略,改进了 Q 学习的奖惩机制;同时在 Q 学习中引入基于经验的状态划分对状态空间进行了优化,并通过建立交通参数融合函数的方式在保持多参数评价交通状态的前提下降低了状态空间存储及更新的复杂度;针对不定周期 Q 学习配时方案状态空间过于庞大的问题,给出基于相位的绿灯配时方案,从而最终达到对交通流的实时响应控制。使用仿真软件对控制方法进行了仿真研究,仿真结果表明该方法的控制效果优于传统控制。

Abstract:: Traditional Q-learning traffic control methods often generate dimension problem due to the randomness and uncertainty of traffic flow which will result in the reduction of learning efficiency and response speed of the control system. For this,we introduce an improved traffic signal adaptive control method based on Q-learning. A fuzzy technology and Q-learning algorithm is combined to improve the reward and punishment mechanism of Q-learning. At the same time,the empirical state partition is introduced in Q-learning to optimize the state space,and the complexity of state space storage and update is reduced on the premise of maintaining multi-parameter evaluation of traffic state by establishing the traffic parameter fusion function. Aiming at the problem that the state space of the unsteady period Q-learning timing scheme is too large,the phase-based green light timing scheme is presented,so as to achieve the real-time response control of traffic flow. At last,simulation software is used for the control method research,which shows that the control effect of this method is better than traditional control.

相似文献/References:

[1]卢涛,万凌峰,李妍,等.基于 V2X 的智能网联交叉口信号控制系统设计[J].计算机技术与发展,2021,31(10):161.[doi:10. 3969 / j. issn. 1673-629X. 2021. 10. 027]
　LU Tao,WAN Ling-feng,LI Yan,et al.Design of Intelligent Signal Control System of Intersections Based on V2X[J].,2021,31(11):161.[doi:10. 3969 / j. issn. 1673-629X. 2021. 10. 027]
[2]彭云建,梁进.基于探索-利用权衡优化的 Q 学习路径规划[J].计算机技术与发展,2022,32(04):1.[doi:10. 3969 / j. issn. 1673-629X. 2022. 04. 001]
　PENG Yun-jian,LIANG Jin.Q-learning Path Planning Based on Exploration / Exploitation Tradeoff Optimization[J].,2022,32(11):1.[doi:10. 3969 / j. issn. 1673-629X. 2022. 04. 001]
[3]乔通,周洲,程鑫,等.基于 Q-学习的底盘测功机自适应 PID 控制模型[J].计算机技术与发展,2022,32(05):117.[doi:10. 3969 / j. issn. 1673-629X. 2022. 05. 020]
　QIAO Tong,ZHOU Zhou,CHENG Xin,et al.Adaptive PID Control Model of Chassis Dynamometer Based on Q-Learning[J].,2022,32(11):117.[doi:10. 3969 / j. issn. 1673-629X. 2022. 05. 020]
[4]刘晓峰 *,刘智斌,董兆安.基于记忆启发的强化学习方法研究[J].计算机技术与发展,2023,33(06):168.[doi:10. 3969 / j. issn. 1673-629X. 2023. 06. 025]
　LIU Xiao-feng *,LIU Zhi-bin,DONG Zhao-an.Research on Memory Heuristic Reinforcement Learning[J].,2023,33(11):168.[doi:10. 3969 / j. issn. 1673-629X. 2023. 06. 025]

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed1091
全文下载/Downloads423
评论/Comments