Content
summary Summary

Meta AI releases a protein database with the structure of hundreds of millions of proteins. It is powered by a new AI model that computes significantly faster than Deepmind's AlphaFold.

In December 2020, Deepmind unveiled AlphaFold, an AI system for predicting protein folds. The AI system is so much faster than alternative methods that it is considered by some scientists to be a solution to the nearly 50-year-old protein folding problem. In July 2021, Deepmind released AlphaFold 2 as open source and a comprehensive protein database.

Proteinstrukturen von ESMFold vorhergesagt
When predicting structure, Meta's AI outputs a numerical value that expresses confidence in its prediction. | Image: Meta

Now, researchers at Meta are also demonstrating their progress in using AI models to predict protein structures.

ESM Metagenomic Alta's database includes hundreds of millions of structure predictions

The ESM Metagenomic Atlas published by Meta includes structure predictions for 617 million proteins found in microbes in soil, the ocean, or the human body. The number of such proteins far exceeds those found in animal and plant life. Yet they are among the least understood proteins.

Ad
Ad

“These are the structures we know the least about. These are incredibly mysterious proteins. I think they offer the potential for great insight into biology,” says Alexander Rives, the research lead for Meta AI’s protein team.

Meta's ESMFold relies on a large language model trained with amino acid sequences. | Image: Meta

The structure predictions come from Meta's ESMFold, an AI model that relies at its core on a large language model trained with amino acid sequences of known proteins. ESMFold can complete sequences after training and predict their structures in a second step.

Meta's ESMFold is less accurate than AlphaFold, but significantly faster

According to Meta, ESMFold does not achieve the accuracy of Deepmind's AlphaFold, but it is 60 times faster. This makes Meta's approach much easier to scale to large databases, as in the case of the now-published database of metagenomic DNA. The vast majority of entries in the database come from organisms that have never been studied in the lab.

The 617 million predictions took Meta's ESMFold two weeks to complete. The model rated one-third of the predictions as high quality. In these cases, researchers can assume that the protein shape is correct and, in some cases, finer atomic-level details are discernible.

Recommendation

Meta is publishing the ESMFold models and a pre-print paper in addition to the ESM Metagenomic Atlas.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Meta publishes the ESM Metagenomic Atlas, a database of 617 million protein structure predictions from microorganisms.
  • The structures were predicted by the AI model ESMFold, which is based on a language model trained with amino acid sequences.
  • ESMFold is less accurate than AlphaFold, but 60 times faster. Meta took only two weeks to make the more than 600 million predictions.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.