The Landscape of GPU-Centric Communication

The Landscape of GPU-Centric Communication

• GPUs dominate HPC/ML workloads, yet inter‑GPU communication remains a scalability bottleneck. • Traditional CPU‑centric communication is being challenged by GPU‑centric models th

ucTrace: A Multi-Layer Profiling Tool for UCX-driven Communication

ucTrace: A Multi-Layer Profiling Tool for UCX-driven Communication

• ucTrace delivers fine‑grained UCX communication traces, filling gaps left by existing MPI profilers. • It maps UCX operations back to originating MPI calls, linking host‑to‑devic

Distributed Order Recording Techniques for Efficient Record-and-Replay of Multi-threaded Programs

Distributed Order Recording Techniques for Efficient Record-and-Replay of Multi-threaded Programs

• OpenMP remains dominant but its non‑determinism hampers debugging and testing. • Record‑and‑replay is essential for deterministic debugging in today’s parallel programs. • Introd