• ucTrace delivers fine‑grained UCX communication traces, filling gaps left by existing MPI profilers. • It maps UCX operations back to originating MPI calls, linking host‑to‑device traffic. • Interactive visualizations expose process‑ and device‑specific interactions across multi‑node CPU‑GPU clusters. • Demonstrated on MPI point‑to‑point, Allreduce, linear solvers, NUMA binding, and GROMACS MD workloads. • Enables system admins, library and application developers to pinpoint bottlenecks and optimize performance. • Open‑source tool available on GitHub, ready for integration into HPC performance workflows.
Article Summaries:
- Researchers have released ucTrace, a new profiling tool that captures fine‑grained communication events at the UCX (Unified Communication X) layer in high‑performance computing (HPC) systems. UCX underpins low‑latency, high‑bandwidth data transfers across CPU‑GPU clusters and is widely used as the transport for MPI, especially in GPU‑aware implementations. Existing profilers miss UCX‑level details or are limited to specific MPI libraries. ucTrace links UCX operations to originating MPI calls, providing interactive visualizations of host‑to‑device and device‑to‑device interactions. The authors demonstrate its utility on point‑to‑point tests, Allreduce comparisons, linear solver communication, NUMA effects, and large‑scale GPU‑accelerated GROMACS simulations. The tool is publicly available.
Sources: