• GPUs dominate HPC/ML workloads, yet inter‑GPU communication remains a scalability bottleneck. • Traditional CPU‑centric communication is being challenged by GPU‑centric models that reduce CPU involvement. • The paper catalogs vendor mechanisms (NVLink, PCIe, InfiniBand) and user‑level libraries (NCCL, MPI‑GPU). • It defines key terminology and classifies communication strategies within and across nodes. • Performance insights highlight trade‑offs between bandwidth, latency, and programming complexity. • Open research questions focus on memory consistency, fault tolerance, and heterogeneous system integration.

Article Summaries:

  • The paper “The Landscape of GPU‑Centric Communication” reviews how modern GPUs are increasingly handling their own inter‑device communication, reducing reliance on the CPU. It surveys vendor‑specific mechanisms and user‑level libraries that enable efficient multi‑GPU data transfer and memory management across nodes. The authors define key terminology, classify existing approaches, and discuss performance trade‑offs and challenges. The work also outlines emerging research directions and open questions, aiming to guide researchers, developers, and library designers in optimizing multi‑GPU systems for high‑performance computing and machine‑learning workloads.

Sources: