Optimizing Allreduce Operations for Modern Heterogeneous Architectures with Multiple Processes per GPU

Optimizing Allreduce Operations for Modern Heterogeneous Architectures with Multiple Processes per GPU

• Computer Science > Distributed, Parallel, and Cluster Computing [Submitted on 18 Aug 2025 (v1), last revised 24 Feb 2026 (this version, v2)] Title:Optimizing Allreduce Operations

Trivance: Latency-Optimal AllReduce by Shortcutting Multiport Networks

Trivance: Latency-Optimal AllReduce by Shortcutting Multiport Networks

• Computer Science > Distributed, Parallel, and Cluster Computing [Submitted on 19 Feb 2026] Title:Trivance: Latency-Optimal AllReduce by Shortcutting Multiport Networks View PDF H