StableLM: Stable Diffusion start-up releases open source language models

British AI company Stability AI is known for its image AI Stable Diffusion. With StableLM, it is now launching a series of open-source language models.

StableLM-Alpha is available now in two sizes with 3 and 7 billion parameters. Larger models with 15 to 65 billion parameters are to follow.

The models are licensed under the Creative Commons CC BY-SA-4.0 license and can thus be used commercially, as long as Stability AI is referenced. The models are released for research purposes anyway.

StableLM, like many other open-source language models, is based on EleutherAI's "The Pile" dataset, but in an "experimental" version that Stability AI says is three times larger than before, with 1.5 trillion tokens. The startup wants to provide details about the dataset "in due course."

Stability AI is currently in a legal battle with Getty Images over Stable Diffusion, for which it scraped images from the Getty Images database without explicit permission - perhaps that's one reason for its reluctance to be transparent about the dataset, or perhaps the current competitive environment. Stability AI reportedly seeks new funding.

Few parameters, but good data

Despite the small number of parameters (3 and 7 billion), StableLM-Alpha shows "surprisingly good performance," Stability AI writes. The quality of the language model results from the "richness of the dataset," it adds, but it doesn't publish any benchmarks.

The context window of StableLM-Alpha, i.e. how many sentence and word components (tokens) the language model can consider simultaneously for an answer, is 4096 tokens, which is at the level of a GPT-3-based ChatGPT.

The StableLM alpha models with 3 and 7 billion tokens are now available on Github. Models with 15, 30, and 65 billion parameters are supposed to follow, along with technical documentation and training parameters. A GPT-3 size model with 175 billion parameters is planned.

As a complement to StableLM-Alpha, Stability AI releases instruction models that are fine-tuned according to the Alpaca formula. Stability AI uses a combination of the five datasets behind Alpaca, GPT4All, Dolly, ShareGPT, and HH.

Recommendation

AI in practice

The great AI scaling debate continues into 2025

These models are to be released as "StableLM-Tuned-Alpha", but are intended for research purposes only and may not be used commercially (CC BY-NC-SA 4.0). A demo of the 7B-Tuned model is available on HuggingFace.

Stability AI also announces a program for an RLHF-based open-source dataset specifically for AI assistants, which it plans to work on with partners such as the OpenAssistant community. With such a dataset, it could potentially fine-tune the StableLM-Alpha models to make them viable for commercial use. Currently, this isn't the case because the training process for StableLM-Tuned-Alpha uses text generated by ChatGPT. Commercial use would violate OpenAI's terms and conditions.

While Stable Diffusion was and is a milestone for image-generating open-source AI, Stability AI may have an uphill battle to achieve similar success with language models: There are now numerous open-source offerings, and the quality is constantly improving - the recently released OpenAssistant, for example, sets new quality standards for dialog-oriented open-source language models, and is constantly being improved.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

StableLM: Stable Diffusion start-up releases open source language models

Few parameters, but good data

The great AI scaling debate continues into 2025

xAI says Grok 4 is no longer searching for Musk's views before it answers

Grok introduces interactive AI avatars for iOS app

Google makes NotebookLM a content platform with curated public notebooks

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Musk unveils Grok 4 as xAI’s new AI model that beats OpenAI and Google on major benchmarks

StableLM: Stable Diffusion start-up releases open source language models

Few parameters, but good data

Share

Bank details