• Computer Science > Distributed, Parallel, and Cluster Computing [Submitted on 24 Feb 2026] Title:ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments View PDF HTML (experimental)Abstract:As LLM deployments scale over more hardware, the probability of a single failure in a system increases significantly, and cloud operators must consider robust countermeasures to handle these inevitable failures. • A common recovery approach is to simply restart the LLM serving instance; however, this is costly in model-as-a-service (MaaS) inference settings, where reloading model weights and recompiling computation graphs can introduce significant delays to incoming requests. • We propose ReviveMoE, a method for rapid failure recovery in large-scale LLM deployments without restarting the serving instance. • ReviveMoE is designed to support both the traditional LLM architecture, which collocates MoE and attention on the same hardware, and the disaggregated architectures, which separate MoE from attention. • Integrated into Huawei Cloud’s MaaS, ReviveMoE is built on top of Huawei’s xDeepServe serving platform and the XCCL communications library. • References & Citations export BibTeX citation Loading…

Article Summaries:

  • Computer Science > Distributed, Parallel, and Cluster Computing [Submitted on 24 Feb 2026] Title:ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments View PDF HTML (experimental)Abstract:As LLM deployments scale over more hardware, the probability of a single failure in a system increases significantly, and cloud operators must consider robust countermeasures to handle these inevitable failures. A common recovery approach is to simply restart the LLM serving instance; however, this is costly in model-as-a-service (MaaS) inference settings, where reloading

Sources: