Semantic Parallelism: Redefining Efficient MoE Inference via Model-Data Co-Scheduling
• Computer Science > Machine Learning [Submitted on 6 Mar 2025 (v1), last revised 24 Feb 2026 (this version, v4)] Title:Semantic Parallelism: Redefining Efficient MoE Inference via