«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j. issn. 1673-629X. 2022. 11. 003]
点击复制

基于课程学习的深度强化学习研究综述()

分享到：

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 32
期数:: 2022年11期

页码:: 16-23

栏目:: 综述

出版日期:: 2022-11-10

文章信息/Info

Title:: An Overview of Deep Reinforcement Learning Based on Curriculum Learning

文章编号:: 1673-629X(2022)11-0016-08

作者:: 林泽阳; 赖俊; 陈希亮; 陆军工程大学指挥控制工程学院,江苏南京 210007

Author(s):: LIN Ze-yang; LAI Jun; CHEN Xi-liang; School of Command and Control Engineering,Army Engineering University of PLA,Nanjing 210007,China

关键词:: 强化学习; 深度学习; 深度强化学习; 课程学习; 迁移学习

Keywords:: reinforcement learning; deep learning; deep reinforcement learning; curriculum learning; transfer learning

分类号:: TP181

DOI:: 10. 3969 / j. issn. 1673-629X. 2022. 11. 003

摘要:: 作为解决序贯决策的机器学习方法,强化学习采用交互试错的方法学习最优策略,能够契合人类的智能决策方式。基于课程学习的深度强化学习是强化学习领域的一个研究热点,它针对强化学习智能体在面临高维状态空间和动作空间时学习效率低、难以收敛的问题,通过抽取一个或多个简单源任务训练优化过程中的共性知识,加速或改善复杂目标任务的学习。论文首先介绍了课程学习的基础知识,从四个角度对深度强化学习中的课程学习最新研究进展进行了综述,包括基于网络优化的课程学习、基于多智能体合作的课程学习、基于能力评估的课程学习、基于功能函数的课程学习。然后对课程强化学习最新发展情况进行了分析,并对深度强化学习中的课程学习的当前存在问题和解决思路进行了总结归纳。最后,基于当前课程学习在深度强化学习中的应用,对课程强化学习的发展和研究方向进行了总结。

Abstract:: As a machine learning method to solve sequential decision making,reinforcement learning adopts interactive trial - and - errormethod to learn the optimal strategy, which can fit human intelligent decision - making mode. Deep reinforcement learning based oncurriculum learning is a new research hotspot in the field of reinforcement learning. Aiming at the problems of? ? ? ?low learning efficiency andhard convergence in high - dimensional state space and action space faced by reinforcement learning agents, by extracting commonknowledge of one or more simple source task training in the process of optimization, the learning of complex target tasks can beaccelerated or improved. Firstly,we introduce the basic knowledge of curriculum? ? learning and summarize the latest research progress ofcurriculum learning in deep reinforcement learning from four perspectives, including the curriculum learning based on network optimization, curriculum learning based on multi - agent cooperation, curriculum learning based on the ability evaluation, curriculumlearning based on the functions. Then we analyze the latest development of curriculum reinforcement learning and summarize the existingproblems and solutions of curriculum learning in deep reinforcement learning. Finally, based on the application of current curriculumlearning in deep reinforcement learning,the development and research direction of curriculum reinforcement learning are summarized.

相似文献/References:

[1]冯林李琛孙焘.Robocup半场防守中的一种强化学习算法[J].计算机技术与发展,2008,(01):59.
　FENG Lin,LI Chen,SUN Tao.A Reinforcement Learning Method for Robocup Soccer Half Field Defense[J].,2008,(11):59.
[2]汤萍萍王红兵.基于强化学习的Web服务组合[J].计算机技术与发展,2008,(03):142.
　TANG Ping-ping,WANG Hong-bing.Web Service Composition Based on Reinforcement -Learning[J].,2008,(11):142.
[3]王朝晖孙惠萍.图像检索中IRRL模型研究[J].计算机技术与发展,2008,(12):35.
　WANG Zhao-hui,SUN Hui-ping.Research of IRRL Model in Image Retrieval[J].,2008,(11):35.
[4]林联明王浩王一雄.基于神经网络的Sarsa强化学习算法[J].计算机技术与发展,2006,(01):30.
　LIN Lian-ming,WANG Hao,WANG Yi-xiong.Sarsa Reinforcement Learning Algorithm Based on Neural Networks[J].,2006,(11):30.
[5]陈强锐,谢世朋.基于深度学习的肺部肿瘤检测方法[J].计算机技术与发展,2018,28(04):201.[doi:10.3969/ j. issn.1673-629X.2018.04.043]
　CHEN Qiang-rui,XIE Shi-peng.Lung Cancer Detection Method Based on Deep Learning[J].,2018,28(11):201.[doi:10.3969/ j. issn.1673-629X.2018.04.043]
[6]施泽浩,赵启军.基于全卷积网络的目标检测算法[J].计算机技术与发展,2018,28(05):55.[doi:10.3969/j.issn.1673－629X.2018.05.013]
　SHI Ze-hao,ZHAO Qi-jun.Object Detection Algorithm Based on Fully Convolutional Neural Network[J].,2018,28(11):55.[doi:10.3969/j.issn.1673－629X.2018.05.013]
[7]黄法秀,张世杰,吴志红,等.数据增广下的人脸识别研究[J].计算机技术与发展,2020,30(03):67.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 013]
　HUANG Fa-xiu,ZHANG Shi-jie,WU Zhi-hong,et al.Research on Face Recognition Based on Data Augmentation[J].,2020,30(11):67.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 013]
[8]陈浩翔,蔡建明,刘铿然,等. 手写数字深度特征学习与识别[J].计算机技术与发展,2016,26(07):19.
　CHEN Hao-xiang,CAI Jian-ming,LIU Keng-ran,et al. Deep Learning and Recognition of Handwritten Numeral Features[J].,2016,26(11):19.
[9]高翔,陈志,岳文静,等.基于视频场景深度学习的人物语义识别模型[J].计算机技术与发展,2018,28(06):53.[doi:10.3969/ j. issn.1673-629X.2018.06.012]
　GAO Xiang,CHEN Zhi,YUE Wen-jing,et al.Human Semantic Recognition Model Based on Video Scene Deep Learning[J].,2018,28(11):53.[doi:10.3969/ j. issn.1673-629X.2018.06.012]
[10]贺飞翔,赵启军. 基于深度学习的头部姿态估计[J].计算机技术与发展,2016,26(11):1.
　HE Fei-xiang,ZHAO Qi-jun. Head Pose Estimation Based on Deep Learning[J].,2016,26(11):1.
[11]张文龙,张洁.基于 A3C 的有序充电算法[J].计算机技术与发展,2023,33(01):173.[doi:10. 3969 / j. issn. 1673-629X. 2023. 01. 026]
　ZHANG Wen-long,ZHANG Jie.Orderly Charging Algorithm Based on A3C[J].,2023,33(11):173.[doi:10. 3969 / j. issn. 1673-629X. 2023. 01. 026]

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed769
全文下载/Downloads528
评论/Comments