Meta unveils four generations of custom AI chips to cut inference costs for billions of users

Mar 12, 2026

MTIA 450 and 500 target generative AI inference

MTIA 450 and 500 are specifically optimized for generative AI inference. MTIA 450 doubles the HBM bandwidth compared to MTIA 400, outperforming existing commercial products, according to Meta. The chips support low-precision data formats like MX4 and MX8, which cut the computing power needed for inference without significantly hurting model quality. MTIA 500 adds another 50 percent HBM bandwidth and up to 80 percent more HBM capacity. Both chips are scheduled for mass production in 2027.

Metric	MTIA 300	MTIA 400	MTIA 450	MTIA 500
Workload Focus	R&R Training	General	GenAI Inference	GenAI Inference
Module TDP	800 W	1200 W	1400 W	1700 W
HBM Bandwidth	6.1 TB/s	9.2 TB/s	18.4 TB/s	27.6 TB/s
HBM Capacity	216 GB	288 GB	288 GB	384-512 GB
MX4 Performance	-	12 PFLOPs	21 PFLOPs	30 PFLOPs
FP8/MX8 Performance	1.2 PFLOPs	6 PFLOPs	7 PFLOPs	10 PFLOPs
BF16 Performance	0.6 PFLOPs	3 PFLOPs	3.5 PFLOPs	5 PFLOPs
Scale-up domain size	16	72	72	72
Scale-up network (unidirectional bandwidth)*	1 TB/s	1.2 TB/s	1.2 TB/s	1.2 TB/s
Scale-out network (unidirectional bandwidth)*	200 GB/s**	100 GB/s	100 GB/s	100 GB/s

On the software side, Meta built the chips around industry standards like PyTorch, vLLM, and Triton. Developers can port existing models to MTIA without special adaptations and run them on GPUs and MTIA at the same time. More technical details are available on Meta's blog.

Meta also continues to work with AMD and Nvidia for GPUs. In early February 2026, Meta announced a billion-dollar deal with AMD to provide up to six gigawatts of AMD Instinct GPU computing power for Meta's AI workloads.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: Meta AI