Content
summary Summary

Intel officially introduced its new AI accelerator Gaudi 3 at Vision 2024.

Ad

According to the company, Gaudi 3 is expected to reduce training time for the 7B and 13B parameter Llama2 models and the 175B parameter GPT-3 model by about 50% compared to the Nvidia H100. Gaudi 3 is also expected to outperform the H100 and H200 GPUs in terms of inference throughput by about 50% and 30% on average, depending on the model.

The standard Gaudi 3 features 96 MB of onboard SRAM cache with 12.8 TB/s bandwidth and 128 GB of HBM2e memory with 3.7 TB/s peak bandwidth. According to Intel, the chip offers twice the FP8 and four times the BF16 processing power, as well as twice the network bandwidth and 1.5 times the memory bandwidth of its predecessor. The 5 nm AI accelerator is also said to be significantly cheaper than the H100. However, Nvidia already has a new offering with the Blackwell architecture.

Intel plans open platform for AI

Gaudi 3 will allow companies to flexibly scale AI systems with up to tens of thousands of accelerators - from single nodes to mega clusters. Intel is relying on open, community-based software and standardized Ethernet networking. Gaudi 3 will be available to OEMs including Dell, HPE, Lenovo, and Supermicro beginning in Q2 2024. New customers and partners for the Intel Gaudi Accelerator including Airtel, Bosch, IBM, NAVER, and SAP were also introduced.

Ad
Ad

Intel also plans to create an open platform for enterprise AI with partners including SAP, RedHat, and VMware. The goal is to accelerate the adoption of secure GenAI systems based on the RAG (Retrieval-Augmented Generation) approach. This allows proprietary data sources to be combined with open-source language models.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Intel has unveiled its new Gaudi 3 AI accelerator, which the company claims will reduce training time for large language models by about 50% compared to Nvidia's GPUs and outperform them in terms of inference throughput.
  • Built on a 5nm process, the chip has 96MB of on-chip SRAM cache, 128GB of HBM2e memory, and offers double the FP8 and quadruple the BF16 processing power over its predecessor, as well as increased network and memory bandwidth.
  • Gaudi 3 will be available to OEMs in Q2 2024.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.