• Computer Science > Distributed, Parallel, and Cluster Computing [Submitted on 17 Feb 2026] Title:Scrutinizing Variables for Checkpoint Using Automatic Differentiation View PDF HTML (experimental)Abstract:Checkpoint/Restart (C/R) saves the running state of the programs periodically, which consumes considerable system resources. • We observe that not every piece of data is involved in the computation in typical HPC applications; such unused data should be excluded from checkpointing for better storage/compute efficiency. • To find out, we propose a systematic approach that leverages automatic differentiation (AD) to scrutinize every element within variables (e.g., arrays) for checkpointing allowing us to identify critical/uncritical elements and eliminate uncritical elements from checkpointing. • Specifically, we inspect every single element within a variable for checkpointing with an AD tool to determine whether the element has an impact on the application output or not. • We empirically validate our approach with eight benchmarks from the NAS Parallel Benchmark (NPB) suite. • We successfully visualize critical/uncritical elements/regions within a variable with respect to its impact (yes or no) on the application output.
Article Summaries:
- A recent study proposes a new method to reduce the overhead of checkpoint/restart in high‑performance computing. By applying automatic differentiation to every element of program variables, the authors can determine which data actually influence the final output. Uncritical elements are then omitted from checkpoints, cutting storage usage by up to 20 %. The technique was tested on eight NAS Parallel Benchmark suites, where it successfully visualized critical versus non‑critical data regions. The approach promises more efficient checkpointing without sacrificing computational correctness, potentially lowering both storage and compute costs in large‑scale simulations.
Sources: