Maia 200: The AI accelerator built for inference

• Maia 200: 3nm TSMC accelerator with native FP8/FP4 tensor cores, 216GB HBM3e, 272MB SRAM. • Outperforms Amazon Trainium (3× FP4) and Google TPU v7 (FP8), delivering top‑tier inference speed. • 30% better performance‑per‑dollar than Microsoft’s latest fleet, boosting AI economics. • Supports GPT‑5.2, Microsoft 365 Copilot, and synthetic data pipelines for in‑house models. • Deployed in US Central (Des Moines) and expanding to US West 3 (Phoenix) datacenters. • Integrated with Azure; Maia SDK offers PyTorch, Triton, and low‑level language for developers.

Article Summaries:

Microsoft has unveiled Maia 200, a new inference accelerator built on TSMC’s 3 nm process. The chip features native FP8/FP4 tensor cores, 216 GB of HBM3e memory, 272 MB on‑chip SRAM, and a redesigned data‑movement engine that boosts token throughput. Maia 200 delivers over 10 petaFLOPS in 4‑bit precision and 5 petaFLOPS in 8‑bit precision, outperforming Amazon Trainium and Google’s latest TPU while improving performance‑per‑dollar by 30 %. It will power models such as GPT‑5.2, Microsoft Foundry, and Microsoft 365 Copilot, and support synthetic‑data pipelines for the Superintelligence team. The first units are deployed in Des Moines, with additional data centers planned.
Microsoft has unveiled Maia 200, a new AI inference accelerator built on TSMC’s 3 nm process. The chip features native FP8/FP4 tensor cores, 216 GB of HBM3e memory, 272 MB on‑chip SRAM, and a redesigned data‑movement engine that delivers 10 petaFLOPS (FP4) and 5 petaFLOPS (FP8) within a 750 W TDP. Maia 200 outperforms Amazon’s Trainium and Google’s TPU in FP4/FP8 throughput and offers a 30 % higher performance‑per‑dollar than Microsoft’s current fleet. It is deployed in U.S. datacenters, supports GPT‑5.2 and other models, and integrates with Azure via a new SDK and PyTorch/Triton support.

Sources:

https://blogs.microsoft.com/blog/2026/01/26/maia-200-the-ai-accelerator-built-for-inference/