Reinforcement Learning for Optimizing Wireless Networks
Main Article Content
Abstract
The latest advancements in reinforcement learning (RL) have enhanced the implementation of online RL for wireless radio resource management (RRM). Nonetheless, online reinforcement learning algorithms need direct engagement with the environment, which may be unfavorable owing to the possible decline in performance resulting from the inevitable exploration inherent in reinforcement learning. This study first examines the use of offline reinforcement learning methods to address the RRM challenge. We assess several cutting-edge offline reinforcement learning algorithms, including behavior constrained Q-learning (BCQ), conservative Q-learning (CQL), and implicit Q-learning (IQL), for a particular radio resource management issue that seeks to optimize a linear combination of sum and 5-percentile rates via user scheduling. The efficacy of offline reinforcement learning for the resource allocation management issue is significantly influenced by the behavior policy used during data collection. We further provide an innovative offline reinforcement learning approach that utilizes heterogeneous datasets gathered from various behavior rules. We demonstrate that an appropriate combination of datasets allows offline reinforcement learning to provide a nearly optimum reinforcement learning policy, despite the significant suboptimality of all participating behavior policies.
Article Details

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.