Nvidia releases free LLMs that match GPT-4 in some benchmarks

Nvidia has released Nemotron-4 340B, an open-source pipeline for generating synthetic data. The language model is designed to help developers create high-quality datasets for training and fine-tuning large language models (LLMs) for commercial applications.

The Nemotron-4 340B family consists of a base model, an instruction model, and a reward model, which together form a pipeline for generating synthetic data that can be used to train and refine LLMs. Nemotron's base model was trained with 9 trillion tokens.

Synthetic data mimics the properties of real data and can improve data quality and quantity, which is particularly important when access to large, diverse, and annotated datasets is limited.

According to Nvidia, the Nemotron-4 340B Instruct model generates diverse synthetic data that can improve the performance and robustness of customized LLMs in various application areas such as healthcare, finance, manufacturing, and retail.

The Nemotron-4 340B Reward model can further improve the quality of the AI-generated data by filtering out high-quality responses.

Nemotron-4 340B Instruct first generates domain-specific, synthetic training texts. The second model, Nemotron-4 340B Reward, then evaluates these generated texts and provides feedback to gradually improve them. The interaction between the two models produces higher-quality training data over time. | Image: Nvidia

98 percent of the training data used to fine-tune the Instruct model is synthetic and was created using Nvidia's pipeline.

In benchmarks such as MT-Bench, MMLU, GSM8K, HumanEval, and IFEval, the Instruct model generally performs better than other open-source models such as Llama-3-70B-Instruct, Mixtral-8x22B-Instruct-v0.1, and Qwen-2-72B-Instruct, and in some tests, it even outperforms GPT-4o.

The three Nemo models are among the top open models, however the model boasts significant more parameters, which might make it less efficient in comparison. | Picture: Nvidia

It also performs comparable to or better than OpenAI's GPT-4-1106 in human evaluation for various text tasks such as summaries and brainstorming. Detailed benchmarks are available in the technical report. According to Nvidia, the models run on DGX H100 systems with eight GPUs at FP8 precision.

Nvidia's Nemotron 340B-Instruct model is on par in text task benchmarks with GPT-4 1106. | Image: Nvidia

The models are optimized for inference with the open-source framework Nvidia NeMo and the Nvidia TensorRT-LLM library. Nvidia makes them available under its Open Model License, which also allows for commercial use. All data is available on Huggingface.

Recommendation

AI in practice

Update

OpenAI's new 'o1' model thinks longer to give smarter answers

Releasing Nemotron framed as a synthetic data generator seems to be a very strategic move by Nvidia: instead of positioning Nemotron as a competitor to Llama 3 or GPT-4, the model family is supposed to help other developers to train better or more models in different domains. More training and more models on the market means more demand for GPUs.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Nvidia releases free LLMs that match GPT-4 in some benchmarks

OpenAI's new 'o1' model thinks longer to give smarter answers

Bloomberg: China’s AI expansion in Xinjiang relies on Nvidia chips despite U.S. export controls

African and South American countries are almost entirely excluded from global AI development

Nvidia's Huang disputes Anthropic CEO's claim that AI will eliminate half of entry-level office jobs

AI coding can make developers slower even if they feel faster

Musk unveils Grok 4 as xAI’s new AI model that beats OpenAI and Google on major benchmarks

"Cat attack" on reasoning model shows how important context engineering is

Nvidia releases free LLMs that match GPT-4 in some benchmarks

Share

Bank details