FlexOlmo enables organizations to collaboratively train LLMs without data sharing

FlexOlmo, developed by the Allen Institute for AI, shows that organizations can collaboratively train language models on local datasets without sharing sensitive data.

In regulated industries, organizations often have valuable data for training AI models but can't share it outside their walls. FlexOlmo takes a different approach, using a mixture-of-experts setup where each expert is trained independently on closed datasets. Instead of exchanging raw data, organizations train their own expert locally and only share the resulting model weights with the group.

The main issue with independently trained experts is coordination. FlexOlmo tackles this by using a frozen public model as a fixed reference. The public expert remains unchanged during training, while new experts are trained on local data. This way, all experts align with the same reference model and can be combined later without additional retraining.

Flexibility for sensitive data

FlexOlmo is well-suited for cases where data access needs to be tightly controlled. Data sources can be activated or deactivated depending on the application. For example, toxic content might be included for research but excluded from general use.

The researchers demonstrated this by removing the news expert in a test run. As expected, performance on news-related tasks dropped, but results in other areas remained stable.

Bar chart: Performance (%) on NewsG, MC9, Code, and Math2 in the Full 8 Expert model vs. without News Expert. — When the news expert is removed from FlexOlmo, performance on news tasks drops, but results in other areas stay nearly the same. | Image: Shi et al.

Even if licenses change or usage rights expire, data sources can be deactivated later without retraining the entire model. The final model has 37 billion parameters, with 20 billion active.

Performance gains in real-world tests

The team evaluated FlexOlmo using a mix of public data and seven specialized datasets: News, Creative Writing, Code, Academic Papers, Educational Text, Math, and Reddit content.

When tested on 31 tasks, FlexOlmo showed an average improvement of 41 percent over a model trained only on public data. In general benchmarks, FlexOlmo actually outperformed a hypothetical model that had access to all data with the same computational effort. Only a model trained on the entire dataset with double the resources did slightly better.

Bar chart: FlexOlmo outperforms the public model without FlexOlmo in four tests, just below the 2×FLOPs upper bound. — FlexOlmo's architecture leads to only minor performance drops in more general benchmarks. | Image: Ai2

Because data owners only share trained model weights, the risk of data leakage is minimal. In testing, attacks to recover training data succeeded just 0.7 percent of the time. For organizations with especially sensitive data, FlexOlmo supports differentially private training, which offers formal privacy guarantees. Each participant can enable this option independently. The Allen Institute has also released OLMoTrace, a tool for tracing language model outputs back to their training sources.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Recommendation

AI research

FlexOlmo enables organizations to collaboratively train LLMs without data sharing

Flexibility for sensitive data

Performance gains in real-world tests

AI language models struggle to connect the dots in long texts, study finds

Meta can use public Facebook and Instagram data for AI training, German court rules

The Atlantic's new tool lets you check if your work was used to train AI models

German court allows non-profit LAION to scrape copyrighted images for AI training

OpenAI launches GPT-5 as a unified system with adaptive reasoning for complex tasks

Google Deepmind's Genie 3 creates interactive 3D worlds that stay consistent for "multiple minutes"

Google upgrades Gemini with Deep Think and flags early warning risks

FlexOlmo enables organizations to collaboratively train LLMs without data sharing

Flexibility for sensitive data

Performance gains in real-world tests

Share

Bank details