• OpenMP remains dominant but its non‑determinism hampers debugging and testing. • Record‑and‑replay is essential for deterministic debugging in today’s parallel programs. • Introduce Distributed Clock (DC) and Distributed Epoch (DE) schemes to cut thread synchronization. • Implemented in ReOMP, achieving 2‑5× speedups over traditional per‑access sync. • Seamlessly integrates with MPI record‑replay via ReMPI, adding minimal runtime overhead. • Demonstrated on real HPC workloads, proving scalability for MPI+OpenMP applications.

Article Summaries:

  • A new study proposes two scalable record‑and‑replay techniques for OpenMP programs, addressing the difficulty of debugging non‑deterministic parallel code. The authors introduce Distributed Clock (DC) and Distributed Epoch (DE) recording schemes that reduce the need for fine‑grained thread synchronization. Implemented in the ReOMP tool, the approach achieves 2-5× speed‑ups over conventional methods that synchronize on every shared‑memory access. The paper also demonstrates seamless integration with MPI‑level replay tools, enabling efficient replay of MPI+OpenMP applications with only a small, MPI‑scale‑independent runtime overhead. The work offers a practical path toward more efficient debugging of high‑performance, multi‑threaded software.

Sources: