Nvidia and its partners have announced a competition to advance the development of hardware using large language models (LLMs).
According to Nvidia, current LLMs like GPT-4 still struggle to generate practical hardware designs without human intervention. This is primarily because the models lack sufficient hardware-specific code during training.
The competition aims to create a comprehensive, high-quality open-source dataset of Verilog code for training LLMs. The goal is to spark an "ImageNet-like revolution in LLM-based hardware code generation," as stated on the competition website.
Nvidia researcher Jim Fan says Nvidia is "very interested" in automating the design process for its next-generation GPUs. The company believes that by building better GPUs, they can achieve more intelligence per unit of training time. This increased intelligence will then lead to improved coding by LLMs, which will ultimately enable the design of even more advanced GPUs.
"Some day, we can take a vacation and NVIDIA will still keep shipping new chips. Time to kickstart a self-bootstrapping, exponential loop that iterates over both hardware and models," Fan writes.
The competition has two phases. In the first phase, participants should collect or generate Verilog code examples to expand the existing MG Verilog dataset, focusing on scalable methods.
In the second phase, participants will receive the complete dataset with all samples submitted in phase one. They will work on improving the dataset's quality through data cleansing and label generation, emphasizing automated methods.
Contributions will be judged based on the improvement the submitted data brings to a fine-tuned CodeLlama 7B-Instruct model. Nvidia will provide contestants with a starter kit that includes a base dataset, sample data, and code to fine-tune the LLM.
Registration for the competition must be completed by the end of July at the latest. The results will be presented at the International Conference on Computer-Aided Design at the end of October.