Evaluating Malleable Job Scheduling in HPC Clusters using Real-World Workloads

• Computer Science > Distributed, Parallel, and Cluster Computing [Submitted on 19 Feb 2026] Title:Evaluating Malleable Job Scheduling in HPC Clusters using Real-World Workloads View PDF HTML (experimental)Abstract:Optimizing resource utilization in high-performance computing (HPC) clusters is essential for maximizing both system efficiency and user satisfaction. • However, traditional rigid job scheduling often results in underutilized resources and increased job waiting times. • This work evaluates the benefits of resource elasticity, where the job scheduler dynamically adjusts the resource allocation of malleable jobs at runtime. • Using real workload traces from the Cori, Eagle, and Theta supercomputers, we simulate varying proportions (0-100%) of malleable jobs with the ElastiSim software. • We evaluate five job scheduling strategies, including a novel one that maintains malleable jobs at their preferred resource allocation when possible. • Results show that, compared to fully rigid workloads, malleable jobs yield significant improvements across all key metrics.

Article Summaries:

Computer Science > Distributed, Parallel, and Cluster Computing [Submitted on 19 Feb 2026] Title:Evaluating Malleable Job Scheduling in HPC Clusters using Real-World Workloads View PDF HTML (experimental)Abstract:Optimizing resource utilization in high-performance computing (HPC) clusters is essential for maximizing both system efficiency and user satisfaction. However, traditional rigid job scheduling often results in underutilized resources and increased job waiting times. This work evaluates the benefits of resource elasticity, where the job scheduler dynamically adjusts the resource alloca

Sources:

https://arxiv.org/abs/2602.17318