基于强化学习的 D3QN 拥塞控制算法-《计算机技术与发展》

文章信息/Info

Title:: Congestion Control Algorithm for D3QN Based on Reinforcement Learning

Author(s):: GUO Meng-zhu; SUN Jun; School of Telecommunications & Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China

Keywords:: massive machine type communication; random access; reinforcement learning; experience reply; base station selection; deepQ-network

摘要:: 在大规模机器类通信 ( Machine Type Communication, MTC) 中, 短时间内大量的设备涌入 LTE - A ( Long TermEvolution-Advanced) 网络,这些设备同时发起随机接入会引起严重的网络拥塞,需要采取合适的措施对拥塞加以控制。为此,提出了一种基于强化学习的 D3QN( Dueling Double Deep Q-network) 算法。 D3QN 以 DQN( Deep Q-Network) 为基础,使用 Double 和 Dueling 两种方法进行改进,并采用优先经验回放对训练数据进行采样,使算法收敛速度更快并且更加稳定。考虑多基站的场景,设备可以对其所在区域内的任一基站发送接入请求。该方案中,使用具有无记忆特性的泊松过程对采用二步的随机接入的 MTC 设备的到达进行建模,设备通过基站广播的发生冲突的前导码数量,改变强化学习中的奖励,使得 MTC 设备能够找到拥塞少的基站进行接入,减少可能的前导码冲突。在不同的负载场景中,将所提方案与传统方案以及其他基于强化学习的方案进行了比较,证明了所提方案在解决大规模问题时的实用性和有效性。

Abstract:: In massive machine type communication ( MTC) , a large number of devices flood into LTE - A ( Long Term Evolution -Advanced) network in a short period of time. These devices initiate random access at the same time,which will cause serious networkcongestion. Appropriate measures need to be taken to control congestion. Therefore,a reinforcement learning - based D3QN ( DuelingDouble Deep Q-network) algorithm is proposed. Based on DQN ( Deep Q-network) ,D3QN uses Double and Dueling to improve,andsamples training data with preferred empirical replay, which makes the algorithm converge faster and more stable. Considering the　scenario of multiple base stations,the device can send an access request to any base station in its area. A memoryless Poisson process tomodel the arrival of MTC devices with two - step random access is used in this scheme. The number of preambles in conflict isbroadcasted by the base station. The device changes the reward during reinforcement learning. This allows MTC devices to find less congested base stations for access and reduce potential conflicts. The proposed scheme is compared with traditional schemes and otherschemes based on reinforcement learning in different load scenarios. It is showed that the proposed scheme is practical and effective insolving massive problems.