Content
summary Summary

In May, MosaicML released what was then one of the best open-source language models, and now the startup is following up with a bigger and more powerful version.

After MPT-7B, MosaicML has released MPT-30B, its second major open-source language model. The new model is a 30-billion-parameter model that MosaicML claims surpasses the performance of OpenAI's GPT-3, despite having about one-sixth the number of parameters.

In some areas, such as coding, it is said to outperform open source models such as Meta's LLaMA or Falcon, and in other areas, it is on par or slightly worse. As always, this information is difficult to verify at this time. Like its predecessor, MPT-30B can be used for commercial purposes and comes in two variants: MPT-30-Instruct, a model trained to follow short instructions, and the chatbot model MPT-30B-Chat.

MPT-30B comes with a longer context window

MPT-30B has also been trained on longer sequences (up to 8,000 tokens) than GPT-3, LLaMA or Falcon (2,000 tokens each). The context length, which is half that of the latest "GPT-3.5-turbo" variant, makes it well suited for use cases where a lot of text or code needs to be processed simultaneously. However, with additional optimization, the sequence length could easily be doubled during fine-tuning or inference, according to MosaicML.

Ad
Ad

As an example, the company cites applications in industries such as healthcare or banking that do not want to hand over their data to OpenAI. The extended context window could be used to interpret lab results and provide insights into a patient's medical history by analyzing different inputs.

MosiacML targets OpenAI's proprietary platform.

MPT-30B is also said to be more computationally efficient than Falcon or LLaMA, running on a single graphics card with 80 gigabytes of memory. Naveen Rao, co-founder, and CEO of MosaicML, explained that the Falcon model, with its 40 billion parameters, could not run on a single GPU.

However, Rao sees proprietary platforms like OpenAI as the real competition; open-source projects are ultimately all on the same team, he said. He emphasized that open-source language models are "closing the gap to these closed-source models." OpenAI's GPT-4 is still clearly superior, he said, but the time has come when they have "crossed the threshold where these models are actually extremely useful."

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  1. MosaicML has released MPT-30B, an open source language model that outperforms OpenAI's GPT-3 and Meta's LLaMA, although it has fewer parameters and is released for commercial use.
  2. MPT-30B was trained on longer sequences and is well suited for applications with a lot of text or code.
  3. According to the startup, the open-source model is computationally efficient and can run on a single graphics card with 80 gigabytes of memory.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.