«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j. issn. 1673-629X. 2023. 01. 018]
点击复制

基于强化学习的异构超密度网络资源分配算法()

分享到：

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 33
期数:: 2023年01期

页码:: 114-120

栏目:: 移动与物联网络

出版日期:: 2023-01-10

文章信息/Info

Title:: Resource Allocation Algorithm for Heterogeneous Ultra-dense Networks Based on Reinforcement Learning

文章编号:: 1673-629X(2023)01-0114-07

作者:: 吴锡; 任正国; 孙君; 南京邮电大学江苏省无线通信重点实验室,江苏南京 210003

Author(s):: WU Xi; REN Zheng-guo; SUN Jun; Jiangsu Key Laboratory of Wireless Communications,Nanjing University of　Posts and Telecommunications,Nanjing 210003,China)

关键词:: 异构超密度网络; 强化学习; 资源分配; 功率分配; 用户服务质量

Keywords:: heterogeneous ultra-dense network; reinforcement learning; resource allocation; power allocation; QoS

分类号:: TP39

DOI:: 10. 3969 / j. issn. 1673-629X. 2023. 01. 018

摘要:: 为了保证下行链路用户服务质量　( Quality of Service,QoS),　提升异构超密度网络的频谱利用率( Spectrum Efficient,SE) 和能源效率( Energy-Efficient,EE) ,　提出了一种基于多智能体强化学习( Deep Reinforcement Learning,DRL)的频谱和功率联合分配算法。首先,以频谱利用率和能源效率为优化目标,用户服务质量为约束,得到资源分配优化函数。然后定义多智能体用户状态空间,奖励以及动作空间,通过较小的通信开销获得状态空间信息,得到一维状态空间数据,减少网络的输入数据量,用户利用自身的信道状态信息( Channel State Information,CSI) 而不依赖全局信道状态信息,再根据状态空间信息得到频谱和功率分配策略。最后,通过训练深度神经网络找到最佳的资源分配策略。仿真结果表明,该算法可以实现较快的收敛速度,对比贪婪算法以及其他强化学习方法, 能源效率均提升 20% 以上, 频谱利用率分别提升 27%和 11% 。

Abstract:: In order to ensure the quality of service ( QoS) for downlink users and improve the spectrum efficiency ( SE) and energyefficiency ( EE) of heterogeneous ultra-dense networks,a multi-agent based joint spectrum and power allocation algorithm of deep reinforcement learning ( DRL) is proposed. Firstly,we obtain the resource allocation optimization function with spectrum utilization andenergy efficiency as the optimization goal. Secondly, the user state space, reward, and action space are defined, and state spaceinformation is obtained through relatively small communication overhead,which is one-dimensional data,and the amount of input data tothe network is reduced. Users use their own channel state information ( CSI) instead of relying on the global channel state informationand then obtain spectrum and power allocation strategies based on the state information. Finally,the best resource allocation strategy isfound by training a deep neural network. The simulation results show that the proposed algorithm can achieve a faster convergence speed.Compared with the greedy algorithm and other reinforcement learning methods,the energy efficiency is increased by more than 20% ,andthe spectrum utilization rate is increased by 27% and 11% ,respectively.

相似文献/References:

[1]冯林李琛孙焘.Robocup半场防守中的一种强化学习算法[J].计算机技术与发展,2008,(01):59.
　FENG Lin,LI Chen,SUN Tao.A Reinforcement Learning Method for Robocup Soccer Half Field Defense[J].,2008,(01):59.
[2]汤萍萍王红兵.基于强化学习的Web服务组合[J].计算机技术与发展,2008,(03):142.
　TANG Ping-ping,WANG Hong-bing.Web Service Composition Based on Reinforcement -Learning[J].,2008,(01):142.
[3]王朝晖孙惠萍.图像检索中IRRL模型研究[J].计算机技术与发展,2008,(12):35.
　WANG Zhao-hui,SUN Hui-ping.Research of IRRL Model in Image Retrieval[J].,2008,(01):35.
[4]林联明王浩王一雄.基于神经网络的Sarsa强化学习算法[J].计算机技术与发展,2006,(01):30.
　LIN Lian-ming,WANG Hao,WANG Yi-xiong.Sarsa Reinforcement Learning Algorithm Based on Neural Networks[J].,2006,(01):30.
[5]农汉琦,孙蕴琪,黄洁,等.基于机器学习的认知无线网络优化策略[J].计算机技术与发展,2020,30(05):125.[doi:10. 3969 / j. issn. 1673-629X. 2020. 05. 024]
　NONG Han-qi,SUN Yun-qi,HUANG Jie,et al.Optimization Strategy of Cognitive Radio Network Based on Machine Learning[J].,2020,30(01):125.[doi:10. 3969 / j. issn. 1673-629X. 2020. 05. 024]
[6]雷莹,许道云.一种合作 Markov 决策系统[J].计算机技术与发展,2020,30(12):8.[doi:10. 3969 / j. issn. 1673-629X. 2020. 12. 002]
　LEI Ying,XU Dao-yun.A Cooperation Markov Decision Process System[J].,2020,30(01):8.[doi:10. 3969 / j. issn. 1673-629X. 2020. 12. 002]
[7]彭云建,梁进.基于探索-利用权衡优化的 Q 学习路径规划[J].计算机技术与发展,2022,32(04):1.[doi:10. 3969 / j. issn. 1673-629X. 2022. 04. 001]
　PENG Yun-jian,LIANG Jin.Q-learning Path Planning Based on Exploration / Exploitation Tradeoff Optimization[J].,2022,32(01):1.[doi:10. 3969 / j. issn. 1673-629X. 2022. 04. 001]
[8]乔通,周洲,程鑫,等.基于 Q-学习的底盘测功机自适应 PID 控制模型[J].计算机技术与发展,2022,32(05):117.[doi:10. 3969 / j. issn. 1673-629X. 2022. 05. 020]
　QIAO Tong,ZHOU Zhou,CHENG Xin,et al.Adaptive PID Control Model of Chassis Dynamometer Based on Q-Learning[J].,2022,32(01):117.[doi:10. 3969 / j. issn. 1673-629X. 2022. 05. 020]
[9]魏竞毅,赖俊,陈希亮.基于互信息的智能博弈对抗分层强化学习研究[J].计算机技术与发展,2022,32(09):142.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 022]
　WEI Jing-yi,LAI Jun,CHEN Xi-liang.Research on Hierarchical Reinforcement Learning of Intelligent Game Confrontation Based on Mutual Information[J].,2022,32(01):142.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 022]
[10]吴鹏,魏上清,董嘉鹏,等.基于 SARSA 强化学习的审判人力资源调度方法[J].计算机技术与发展,2022,32(09):82.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 013]
　WU Peng,WEI Shang-qing,DONG Jia-peng,et al.Trial Human Resources Scheduling Method Based on SARSA Reinforcement Learning[J].,2022,32(01):82.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 013]

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed634
全文下载/Downloads277
评论/Comments