OpenAI's GPT-4 surprises scientists with its ability to model basic protein structures

Aug 21, 2024

Midjourney prompted by The Decoder

Key Points

A study by Rutgers University researchers shows that the GPT-4 language model can predict the structures of simple amino acids and proteins with surprising accuracy, despite not being specifically designed for structural biology tasks.
GPT-4 accurately modeled the atomic composition, bond lengths, and angles of the 20 standard amino acids, and its prediction of an alpha-helix structure was comparable to experimentally determined structures. It also correctly identified the amino acids and distances between interacting atoms in the binding of an antiviral drug to the SARS-CoV-2 main protease enzyme.
While GPT-4's modeling capabilities are still rudimentary compared to dedicated AI tools like AlphaFold 3, the team says the study opens up new possibilities for using language models in structural biology and other life science applications, and warrants further research into the capabilities and limitations of generative AI in these fields.

A study by Rutgers University shows that the GPT-4 language model can model simple amino acid and protein structures with unexpected precision.

Researchers at Rutgers University have explored the capabilities of the GPT-4 AI language model in basic structural biology tasks. The study, published in Scientific Reports, reveals that the model can make surprisingly accurate predictions about molecular structures.

The scientists asked GPT-4 to model the three-dimensional structures of the 20 standard amino acids. The AI model accurately predicted the atomic composition, bond lengths, and angles. However, it made errors when modeling ring structures and stereo-chemical configurations.

In another experiment, GPT-4 was asked to model the structure of an alpha-helix, a common protein structural element. This required integrating the Wolfram plugin for mathematical calculations. The resulting model was comparable to experimentally determined alpha-helix structures.

Additionally, GPT-4 analyzed the binding between the antiviral drug Nirmatrelvir and the main protease enzyme of SARS-CoV-2. The model correctly identified the involved amino acids and accurately specified the distances between interacting atoms.

Results open new possibilities for using language models in biology

These capabilities are remarkable because GPT-4 was not specifically developed for structural biology tasks. The researchers note that GPT-4's modeling method is unclear. It could use existing atomic coordinates from its training dataset or recalculate the structures from scratch - a definitive conclusion would require further extensive studies.

While dedicated AI tools like AlphaFold 3 can predict more complex structures, GPT-4 shows promise for basic structural biology tasks, according to the researchers. The modeling capabilities are currently still rudimentary and have limited practical applications.

Nevertheless, the team says the study sets a precedent for the application of this technology in structural biology. The researchers recommend further study of the capabilities and limitations of generative AI, not only in structural biology, but also for other potential life science applications.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: Nature