• Computer Science > Distributed, Parallel, and Cluster Computing [Submitted on 24 Feb 2026] Title:Lagom: Unleashing the Power of Communication and Computation Overlapping for Distributed LLM Training View PDF HTML (experimental)Abstract:Overlapping communication with computation is crucial for distributed large-model training, yet optimizing it - especially when computation becomes the bottleneck-remains challenging. • We present Lagom, a system that co-tunes communication parameters to balance resource usage between computation and communication. • By introducing a unified cost model and a priority-based search algorithm, Lagom reduces optimization complexity from exponential to linear. • Evaluations on high- and low-bandwidth GPU clusters show that Lagom achieves 1.07-1.33x and 1.03-1.27x speedup over NCCL and AutoCCL across diverse models and parallelizations. • References & Citations export BibTeX citation Loading… • Bibliographic and Citation Tools Bibliographic Explorer (What is the Explorer?) Connected Papers (What is Connected Papers?) Litmaps (What is Litmaps?) scite Smart Citations (What are Smart Citations?) Code, Data and Media Associated with this Article alphaXiv (What is alphaXiv?) CatalyzeX Code Finder for Papers (What is CatalyzeX?) DagsHub (What is DagsHub?) Gotit.pub (What is GotitPub?) Hugging Face (What is Huggingface?) Papers with Code (What is Papers with Code?) ScienceCast (What is ScienceCast?) Demos Recommenders and Search Tools Influence Flower (What are Inf
Article Summaries:
- Computer Science > Distributed, Parallel, and Cluster Computing [Submitted on 24 Feb 2026] Title:Lagom: Unleashing the Power of Communication and Computation Overlapping for Distributed LLM Training View PDF HTML (experimental)Abstract:Overlapping communication with computation is crucial for distributed large-model training, yet optimizing it - especially when computation becomes the bottleneck-remains challenging. We present Lagom, a system that co-tunes communication parameters to balance resource usage between computation and communication. By introducing a unified cost model and a priorit
Sources:
- https://arxiv.org/abs/2602.20656 (Latest source article published: 2026-02-25 05:00 UTC)