A study by Rutgers University shows that the GPT-4 language model can model simple amino acid and protein structures with unexpected precision.
Researchers at Rutgers University have explored the capabilities of the GPT-4 AI language model in basic structural biology tasks. The study, published in Scientific Reports, reveals that the model can make surprisingly accurate predictions about molecular structures.
The scientists asked GPT-4 to model the three-dimensional structures of the 20 standard amino acids. The AI model accurately predicted the atomic composition, bond lengths, and angles. However, it made errors when modeling ring structures and stereo-chemical configurations.
In another experiment, GPT-4 was asked to model the structure of an alpha-helix, a common protein structural element. This required integrating the Wolfram plugin for mathematical calculations. The resulting model was comparable to experimentally determined alpha-helix structures.
Additionally, GPT-4 analyzed the binding between the antiviral drug Nirmatrelvir and the main protease enzyme of SARS-CoV-2. The model correctly identified the involved amino acids and accurately specified the distances between interacting atoms.
Results open new possibilities for using language models in biology
These capabilities are remarkable because GPT-4 was not specifically developed for structural biology tasks. The researchers note that GPT-4's modeling method is unclear. It could use existing atomic coordinates from its training dataset or recalculate the structures from scratch - a definitive conclusion would require further extensive studies.
While dedicated AI tools like AlphaFold 3 can predict more complex structures, GPT-4 shows promise for basic structural biology tasks, according to the researchers. The modeling capabilities are currently still rudimentary and have limited practical applications.
Nevertheless, the team says the study sets a precedent for the application of this technology in structural biology. The researchers recommend further study of the capabilities and limitations of generative AI, not only in structural biology, but also for other potential life science applications.