Optimal Multi-Debris Mission Planning in LEO: A Deep Reinforcement Learning Approach with Co-Elliptic Transfers and Refueling

• Computer Science > Machine Learning [Submitted on 4 Feb 2026] Title:Optimal Multi-Debris Mission Planning in LEO: A Deep Reinforcement Learning Approach with Co-Elliptic Transfers and Refueling View PDF HTML (experimental)Abstract:This paper addresses the challenge of multi target active debris removal (ADR) in Low Earth Orbit (LEO) by introducing a unified coelliptic maneuver framework that combines Hohmann transfers, safety ellipse proximity operations, and explicit refueling logic. • We benchmark three distinct planning algorithms Greedy heuristic, Monte Carlo Tree Search (MCTS), and deep reinforcement learning (RL) using Masked Proximal Policy Optimization (PPO) within a realistic orbital simulation environment featuring randomized debris fields, keep out zones, and delta V constraints. • Experimental results over 100 test scenarios demonstrate that Masked PPO achieves superior mission efficiency and computational performance, visiting up to twice as many debris as Greedy and significantly outperforming MCTS in runtime. • These findings underscore the promise of modern RL methods for scalable, safe, and resource efficient space mission planning, paving the way for future advancements in ADR autonomy. • Submission history From: Agni Bandyopadhyay [view email][v1] Wed, 4 Feb 2026 22:15:14 UTC (1,886 KB) Current browse context: cs.LG Change to browse by: References & Citations export BibTeX citation Loading… • Bibliographic and Citation Tools Bibliographic Explorer (What is the Expl

Article Summaries:

A recent arXiv submission proposes a new framework for planning active debris removal (ADR) missions in Low Earth Orbit (LEO). The authors introduce a unified co‑elliptic maneuver scheme that blends Hohmann transfers, safety‑ellipse proximity operations, and explicit refueling logic. They benchmark three planners-Greedy, Monte Carlo Tree Search (MCTS), and a Masked Proximal Policy Optimization (PPO) deep‑reinforcement‑learning agent-across 100 simulated debris fields with random obstacles and ΔV limits. Results show the Masked‑PPO agent visits up to twice as many debris pieces as the Greedy baseline and outperforms MCTS in both mission efficiency and runtime, underscoring RL’s potential for scalable, safe ADR mission planning.

Sources:

https://arxiv.org/abs/2602.17685