Distributedcomputing

The Landscape of GPU-Centric Communication

• GPUs dominate HPC/ML workloads, yet inter‑GPU communication remains a scalability bottleneck. • Traditional CPU‑centric communication is being challenged by GPU‑centric models th

WANSpec: Leveraging Global Compute Capacity for LLM Inference

• WANSpec leverages under‑utilized global data centers for LLM inference to reduce latency and cost. • Uses speculative decoding by moving draft model to low‑demand GPUs, cutting f