Content
summary Summary

Meta has released OMol25, the largest open dataset to date for AI-driven chemistry, and introduced UMA, a universal AI model designed to predict chemical properties of molecules and materials.

Ad

OMol25 contains data from over 100 million high-precision molecular calculations—far surpassing the size of any previous open dataset in the field. According to Meta, generating the dataset required more than 6 billion hours of supercomputing time. The dataset covers a wide range of molecules, from small organic compounds and biomolecules (like protein fragments and DNA segments) to metal complexes and electrolytes. It also includes charged and spin states, multiple conformations (spatial arrangements), and chemical reactions.

The aim is to help AI models learn how molecules behave across a broad spectrum of scenarios. OMol25 is openly available for uses such as drug discovery, battery material development, and catalyst research. In addition to energy and force values, the dataset includes extra details like charge distributions, orbitals, and other chemical properties.

The OMol25 dataset is available on Hugging Face.

Ad
Ad

UMA: A Universal AI Model for Molecules and Materials

Alongside OMol25, Meta is launching UMA (Universal Model for Atoms), a new AI model trained on OMol25 and other datasets. UMA can predict chemical properties at the atomic level and, according to Meta, does so much faster than traditional approaches.

Unlike earlier methods that required a specialized model for each task, UMA handles a wide range of applications—from molecular simulations (such as for drug discovery) to materials and catalysis research. It’s built on modern graph neural networks and uses a "Mixture of Linear Experts" architecture to combine speed with high accuracy. In benchmark tests, UMA achieved results that previously were only possible with specialized, fine-tuned models.

With UMA, simulations and calculations that once took days can now be completed in seconds. Meta says this could allow researchers to screen thousands of new molecules for their potential as drugs or battery materials before any lab synthesis is needed.

UMA models are available on Hugging Face.

New Methods for Fast AI Simulations

Traditional AI models have typically needed large amounts of training data to generate new molecular structures. Meta is now introducing "Adjoint Sampling," a new method that lets AI models learn and propose novel structures even when no real-world examples exist.

Recommendation

This technique draws on concepts from stochastic control theory and uses diffusion processes, which Meta’s team says are especially well-suited for simulating molecules. Adjoint Sampling enables the rapid exploration of many structural variants with just a few calculations.

In early tests, the method was able to generate molecular conformations that not only matched but often outperformed classical software—especially for molecules with many flexible components.

The model, code, and more information are available on Hugging Face and GitHub.

Despite these advances, Meta notes there are still open challenges. Some areas of chemistry—such as polymers, certain metals, or complex protonation states—are not yet fully covered. The AI models also need to improve their ability to predict charges, spins, and long-range interactions.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Meta has published OMol25, the largest open data set for AI-supported chemistry. It contains over 100 million precisely calculated molecules, covering an enormous range of chemical structures and states, and is freely available.
  • Meta also presents UMA, a universal AI model that can predict the chemical properties of molecules and materials at the atomic level faster than conventional methods. UMA combines many previously specialized tasks into one model.
  • Meta also introduces "Adjoint Sampling," a new method that allows AI models to predict new molecular structures without large amounts of data and achieve a high number of variants, particularly for flexible molecules. However, some chemical areas remain challenging.
Max is the managing editor of THE DECODER, bringing his background in philosophy to explore questions of consciousness and whether machines truly think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.