OpenAI is stepping into life sciences with a new LLM designed to optimize proteins. Early testing suggests the system might work better than human researchers at certain tasks.
Working with startup Retro Biosciences, OpenAI has created a specialized language model called GPT-4b micro. The model focuses on improving Yamanaka factors - proteins that can turn regular cells into stem cells. Scientists see this process as a promising way to rejuvenate tissue and potentially grow human organs.
The team trained GPT-4b micro on protein sequences from various species and data about how proteins interact with each other. Similar to how ChatGPT completes sentences, this model suggests different versions of proteins that researchers can then test in the lab.
This approach is different from Google's Alphafold, which uses a diffusion network, similar to AI image generators. According to Retro CEO Betts-Lacroix, their language model approach works particularly well with Yamanaka proteins, which have a "floppy and unstructured" nature. However, the team still isn't sure exactly how the model reaches its conclusions.
Early results show promise, but lack external validation
OpenAI researcher John Hallman told Technology Review that "across the board," the model's protein suggestions "seem better than what the scientists were able to produce by themselves." Tests show two Yamanaka factors improved by up to 50 times compared to existing versions.
While these results sound encouraging, they should be treated with caution. Outside scientists won't be able to verify these claims until OpenAI and Retro publish their research - something they plan to do but haven't done yet.
The model isn't available to the public, and there's no clear timeline for when it might become a product. OpenAI hasn't decided whether to integrate this technology into their existing reasoning models or develop it as a separate tool.
It's worth noting that OpenAI CEO Sam Altman has invested $180 million in Retro Biosciences.