Ad
Skip to content

MPT-7B: The best open-source LLM available for commercial use

Image description
Midjourney prompted by THE DECODER

Key Points

  • MosaicML releases MPT-7B, a language model that matches the performance of Meta's LLaMA 7B and is open-source.
  • MPT-7B and its variants are released for commercial purposes.
  • In addition to a Chat and Instruct model, there is also a StoryWriter model with a context window of 65,000 tokens.

MosaicML releases the best open-source language model yet, licensed for commercial use. One variant can even handle entire books.

MosaicML's MPT-7B is a large language model with nearly 7 billion parameters, which the team trained on its own dataset of nearly a trillion tokens.

MosaicML followed the training regimen of Meta's LLaMA model. The training cost nearly $200,000 and took 9.5 days using the MosaicML platform.

MosaicML MPT-7B is the best open-source model yet

According to MosaicML, MPT-7B matches the performance of Meta's 7-billion-parameter LLaMA model, making it the first open-source model to reach that level, ahead of OpenLLaMA.

Ad
DEC_D_Incontent-1

MosaicML's MPT-7B achieves the quality of Meta's LLaMA 7B. | Image: MosaicML

Unlike Meta's models, however, MPT-7B is licensed for commercial use.

In addition to the "MPT-7B Base" model, MosaicML also releases three variants: MPT-7B-StoryWriter-65k+, MPT-7B-Instruct and MPT-7B-Chat.

MosaicML releases language model with 65,000 tokens context

MPT-7B-Instruct is a model for following instructions, and the Chat model is a chatbot variant in the style of Alpaca or Vicuna.

With MPT-7B-StoryWriter-65k+, MosaicML also releases a model that is able to read and write stories with very long context lengths. For this purpose, MPT-7B was fine-tuned with a context length of 65,000 tokens using a subset of the books3 dataset. The largest GPT-4 variant of OpenAI is able to handle 32,000 tokens.

Ad
DEC_D_Incontent-2

According to MosiacML, the model can scale beyond 65,000 tokens with some optimizations, and the team has demonstrated up to 84,000 tokens on a single node using Nvidia A100-80GB GPUs. But even with 65,000 tokens, it was possible to read entire novels and write an epilogue.

The StoryWriter model can handle The Great Gatsby. | Image: MosaicML

All MPT-7B models are available on GitHub.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: MosaicML