Microsoft Aims to Run Most AI on Maia Chips, But GPUs Still Essential

Microsoft’s CTO says the company wants most AI workloads to run on its own Maia accelerators for better price-performance and system control. The first Maia 100 shifted some inference (e.g., GPT-3.5) but lagged top GPUs; a stronger second-gen Maia is expected next year. Even with this pivot, Microsoft will keep buying Nvidia and AMD GPUs because customers still prefer them for many workloads.
Key Points
- Microsoft aims for most AI workloads to run on its in-house Maia accelerators to improve performance per dollar and system-level optimization.
- Kevin Scott says Nvidia has led on price-performance so far, but Microsoft wants freedom to design the full stack (compute, network, cooling).
- Maia 100 offloaded GPT-3.5 in 2023 but trailed Nvidia/AMD GPUs; a more competitive second-gen Maia is reportedly coming next year.
- Complete replacement of Nvidia/AMD is unlikely as customers still want GPUs, similar to how Google and AWS balance custom chips with GPUs.
- Microsoft is also building Cobalt CPUs and security silicon to bolster datacenter compute and cryptography.
Sentiment
The community is divided but leans skeptical. While most agree that custom silicon makes economic sense for hyperscalers in principle, there is widespread doubt about Microsoft's specific ability to execute. Google's TPU success after a decade validates the concept, but Microsoft's late start and perceived institutional challenges make their timeline questionable. Many suspect the announcement serves more as leverage against Nvidia than as a genuine hardware roadmap.
In Agreement
- Hyperscalers building custom silicon is an inevitable economic decision — cutting out Nvidia's margin by going directly to TSMC and Broadcom makes clear financial sense
- For well-defined workloads like transformer training and inference, only a limited set of compute primitives is needed, so CUDA is not a meaningful barrier to custom chip adoption
- Microsoft has been working on AI accelerators since at least 2018 with Project Brainwave and Catapult FPGAs, and has existing silicon expertise in Azure
- Even just announcing custom silicon ambitions applies useful downward pricing pressure on Nvidia
- A late start may actually be beneficial since current LLM architectures have different hardware requirements than older DNNs, allowing newcomers to optimize specifically for transformers
Opposed
- Microsoft lacks the credibility and track record to deliver competitive custom silicon — hardware takes multiple generations to mature and Microsoft is institutionally challenged at execution
- The announcement may be primarily negotiating leverage against Nvidia rather than a serious hardware commitment, similar to previous announcements that produced no tangible results
- The AI bubble may burst before Microsoft's chips become competitive, making the massive investment wasted
- Interconnect design is the real bottleneck for large-scale AI training clusters, requiring rare specialized expertise that Microsoft has not demonstrated
- Building custom silicon creates a long-term talent and ecosystem maintenance burden — the CUDA developer flywheel that made Nvidia dominant is extremely hard to replicate