相似文献/References:
[1]冯林 李琛 孙焘.Robocup半场防守中的一种强化学习算法[J].计算机技术与发展,2008,(01):59.
FENG Lin,LI Chen,SUN Tao.A Reinforcement Learning Method for Robocup Soccer Half Field Defense[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2008,(05):59.
[2]汤萍萍 王红兵.基于强化学习的Web服务组合[J].计算机技术与发展,2008,(03):142.
TANG Ping-ping,WANG Hong-bing.Web Service Composition Based on Reinforcement -Learning[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2008,(05):142.
[3]王朝晖 孙惠萍.图像检索中IRRL模型研究[J].计算机技术与发展,2008,(12):35.
WANG Zhao-hui,SUN Hui-ping.Research of IRRL Model in Image Retrieval[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2008,(05):35.
[4]林联明 王浩 王一雄.基于神经网络的Sarsa强化学习算法[J].计算机技术与发展,2006,(01):30.
LIN Lian-ming,WANG Hao,WANG Yi-xiong.Sarsa Reinforcement Learning Algorithm Based on Neural Networks[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2006,(05):30.
[5]陈波 王欢 唐伦.认知无线网络中基于多用户公平性的功率分配[J].计算机技术与发展,2011,(04):77.
CHEN Bo,WANG Huan,TANG Lun.Power Allocation Based-on Multi-user Fairness in Cognitive Wireless Networks[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2011,(05):77.
[6]高卉[],冯友宏[][],王晓雨[]. 认知无线传感网络中吞吐量能耗均衡研究[J].计算机技术与发展,2017,27(10):130.
GAO Hui[],FENG You-hong[][],WANG Xiao-yu[]. Research on Tradeoff of Energy Consumption and Throughput in Cognitive Wireless Sensor Networks[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2017,27(05):130.
[7]雷 莹,许道云.一种合作 Markov 决策系统[J].计算机技术与发展,2020,30(12):8.[doi:10. 3969 / j. issn. 1673-629X. 2020. 12. 002]
LEI Ying,XU Dao-yun.A Cooperation Markov Decision Process System[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2020,30(05):8.[doi:10. 3969 / j. issn. 1673-629X. 2020. 12. 002]
[8]彭云建,梁 进.基于探索-利用权衡优化的 Q 学习路径规划[J].计算机技术与发展,2022,32(04):1.[doi:10. 3969 / j. issn. 1673-629X. 2022. 04. 001]
PENG Yun-jian,LIANG Jin.Q-learning Path Planning Based on Exploration / Exploitation Tradeoff Optimization[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2022,32(05):1.[doi:10. 3969 / j. issn. 1673-629X. 2022. 04. 001]
[9]乔 通,周 洲,程 鑫,等.基于 Q-学习的底盘测功机自适应 PID 控制模型[J].计算机技术与发展,2022,32(05):117.[doi:10. 3969 / j. issn. 1673-629X. 2022. 05. 020]
QIAO Tong,ZHOU Zhou,CHENG Xin,et al.Adaptive PID Control Model of Chassis Dynamometer Based on Q-Learning[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2022,32(05):117.[doi:10. 3969 / j. issn. 1673-629X. 2022. 05. 020]
[10]魏竞毅,赖 俊,陈希亮.基于互信息的智能博弈对抗分层强化学习研究[J].计算机技术与发展,2022,32(09):142.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 022]
WEI Jing-yi,LAI Jun,CHEN Xi-liang.Research on Hierarchical Reinforcement Learning of Intelligent Game Confrontation Based on Mutual Information[J].COMPUTER TECHNOLOGY AND DEVELOPMENT,2022,32(05):142.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 022]