Deep Reinforcement Learning Approach to QoSAware Load Balancing in 5G Cellular Networks under User Mobility and Observation Uncertainty

• Computer Science > Networking and Internet Architecture [Submitted on 28 Oct 2025 (v1), last revised 18 Feb 2026 (this version, v2)] Title:Deep Reinforcement Learning Approach to QoSAware Load Balancing in 5G Cellular Networks under User Mobility and Observation Uncertainty View PDFAbstract:Efficient mobility management and load balancing are critical to sustaining Quality of Service (QoS) in dense, highly dynamic 5G radio access networks. • We present a deep reinforcement learning framework based on Proximal Policy Optimization (PPO) for autonomous, QoS-aware load balancing implemented end-to-end in a lightweight, pure-Python simulation environment. • The control problem is formulated as a Markov Decision Process in which the agent periodically adjusts Cell Individual Offset (CIO) values to steer user-cell associations. • A multi-objective reward captures key performance indicators (aggregate throughput, latency, jitter, packet loss rate, Jain’s fairness index, and handover count), so the learned policy explicitly balances efficiency and stability under user mobility and noisy observations. • The PPO agent uses an actor-critic neural network trained from trajectories generated by the Python simulator with configurable mobility (e.g., Gauss-Markov) and stochastic measurement noise. • Across 500+ training episodes and stress tests with increasing user density, the PPO policy consistently improves KPI trends (higher throughput and fairness, lower delay, jitter, packet loss, and handovers

Article Summaries:

A recent study proposes a deep‑reinforcement‑learning (DRL) framework for load balancing in dense 5G networks. Using Proximal Policy Optimization (PPO), an agent learns to adjust Cell Individual Offset (CIO) values to steer user‑cell associations. The problem is framed as a Markov Decision Process with a multi‑objective reward that balances throughput, latency, jitter, packet loss, fairness, and handover count. Implemented in a lightweight Python simulator, the PPO policy outperforms rule‑based and other learning baselines (ReBuHa, A3, CDQL) across all key performance indicators, showing faster convergence and better generalization under increasing user density and noisy observations.

Sources:

https://arxiv.org/abs/2510.24869