Microsoft Aims to Run Most AI on Maia Chips, But GPUs Still Essential

Microsoft’s CTO says the company wants most AI workloads to run on its own Maia accelerators for better price-performance and system control. The first Maia 100 shifted some inference (e.g., GPT-3.5) but lagged top GPUs; a stronger second-gen Maia is expected next year. Even with this pivot, Microsoft will keep buying Nvidia and AMD GPUs because customers still prefer them for many workloads.

Key Points

Microsoft aims for most AI workloads to run on its in-house Maia accelerators to improve performance per dollar and system-level optimization.
Kevin Scott says Nvidia has led on price-performance so far, but Microsoft wants freedom to design the full stack (compute, network, cooling).
Maia 100 offloaded GPT-3.5 in 2023 but trailed Nvidia/AMD GPUs; a more competitive second-gen Maia is reportedly coming next year.
Complete replacement of Nvidia/AMD is unlikely as customers still want GPUs, similar to how Google and AWS balance custom chips with GPUs.
Microsoft is also building Cobalt CPUs and security silicon to bolster datacenter compute and cryptography.

Sentiment

The overall sentiment in the Hacker News discussion is predominantly skeptical and critical regarding Microsoft's ability to successfully implement its plan to largely replace Nvidia and AMD GPUs with its homegrown Maia accelerators. While many commenters recognize the strategic and economic rationale for hyperscalers to pursue custom silicon, they express significant doubt about Microsoft's capacity to execute, citing its late entry, past hardware failures, institutional challenges, and the formidable barriers posed by existing ecosystems like Nvidia's CUDA.

In Agreement

Hyperscalers like Microsoft are likely to pursue custom silicon for inference and a large portion of training workloads due to benefits like cost reduction and end-to-end system control.
The development of custom silicon by major consumers helps foster competition and exerts downward pressure on NVIDIA's pricing and market power.
Custom SoCs and ASICs have become crucial in AI/ML for several years, especially for 'embarrassingly parallel' problems like model inference and training.
Vertical integration in hardware development, similar to Apple's strategy, can lead to significant efficiencies and cost savings in data centers.
The 'CUDA moat' might be overstated for transformer architectures, which rely on a limited set of primitives, making porting to new architectures more manageable.

Opposed

Microsoft lacks the necessary credibility and a strong track record in hardware development for this initiative to be more than just 'talk'.
Microsoft is seen as 'too late to the custom silicon party,' as hardware development typically requires multiple generations to become truly competitive, and the 'AI bubble' might burst before they catch up.
Successful custom silicon for AI requires substantial investment not only in chip design but also in highly specialized interconnects, an area with a scarcity of expert engineers.
Microsoft's institutional structure and past failures (e.g., Graphcore investment, mobile, Xbox business issues) raise doubts about its ability to execute complex hardware projects effectively.
The 'CUDA moat' (NVIDIA's entrenched software ecosystem) remains a significant barrier, limiting custom hardware primarily to internal or very specific use cases due to talent attraction challenges.
Microsoft's ability to attract top-tier talent for vertical integration may be hampered by its pay structure compared to companies like Apple.
Some view the announcement as a move to 'produce the illusion of growth and future cash flow' in a 'bubblified environment' rather than a truly solid strategic play.