ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments

ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments

• Computer Science > Distributed, Parallel, and Cluster Computing [Submitted on 24 Feb 2026] Title:ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference D