Ensuring Balanced GPU Allocation in Kubernetes Clusters with Time-Based Fairshare

• NVIDIA Run:ai v2.24 introduces time-based fairshare scheduling for Kubernetes GPU clusters. • Scheduler tracks historical GPU usage, adjusting queue scores to balance long-term resource allocation. • Equal-priority teams avoid starvation; large jobs receive timely burst access without being blocked. • Existing quotas and priorities remain intact, ensuring backward compatibility. • Enables proportional compute time over days/weeks, aligning with weekly/monthly GPU-hour budgets. • Improves cluster utilization and resource planning for enterprise GPU workloads.

Article Summaries:

NVIDIA Run:ai v2.24 adds a time‑based fairshare scheduling mode to its KAI Scheduler, addressing long‑standing fairness issues in shared GPU clusters. The new feature tracks historical GPU usage instead of evaluating fairness at a single instant, giving lower scores to teams that have recently consumed more resources and boosting those that have been waiting. This approach ensures proportional compute time over days or weeks while preserving existing guaranteed quotas and queue priorities. The result is more balanced GPU allocation, enabling burst access for large jobs and better alignment with weekly or monthly GPU‑hour budgets.

Sources:

https://developer.nvidia.com/blog/ensuring-balanced-gpu-allocation-in-kubernetes-clusters-with-time-based-fairshare/