The Landscape of GPU-Centric Communication
• GPUs dominate HPC/ML workloads, yet inter‑GPU communication remains a scalability bottleneck. • Traditional CPU‑centric communication is being challenged by GPU‑centric models th
• GPUs dominate HPC/ML workloads, yet inter‑GPU communication remains a scalability bottleneck. • Traditional CPU‑centric communication is being challenged by GPU‑centric models th
• ucTrace delivers fine‑grained UCX communication traces, filling gaps left by existing MPI profilers. • It maps UCX operations back to originating MPI calls, linking host‑to‑devic
• OpenMP remains dominant but its non‑determinism hampers debugging and testing. • Record‑and‑replay is essential for deterministic debugging in today’s parallel programs. • Introd