An international team of researchers has launched Polymathic AI. The team is developing artificial intelligence that learns from data and simulations from different scientific fields to gain new insights.
The Polymathic AI initiative aims to accelerate the development of versatile foundation models for machine learning applications in science. It aims to bring the benefits of foundation models, which have been used primarily in image and language processing, to the world of scientific machine learning.
The team behind Polymathic AI consists of experts from various institutions and disciplines. These include the Simons Foundation and the Flatiron Institute, New York University, the University of Cambridge, Princeton University, and Lawrence Berkeley National Laboratory. They are specialists in physics, astrophysics, mathematics, artificial intelligence, and neuroscience.
The initiative was announced in tandem with the publication of related scientific papers on arXiv.org, an open-access repository.
Foundation model for science
According to Shirley Ho, project director and group leader at the Flatiron Institute's Center for Computational Astrophysics in New York, the project will "completely change how people use AI and machine learning in science."
The idea: A pre-trained model, also called a foundation model, can be faster and more accurate than building a scientific model from scratch. This is true even if the training data is not directly relevant to the scientific problem.
The challenge is to build AI models that can use information from heterogeneous data sets and different scientific domains. Unlike fields such as natural language processing, there is no standard representation such as text.
Polymathic AI therefore treats numbers as real numbers rather than characters such as letters and punctuation. The training data consists of actual scientific datasets that capture the underlying physics of the cosmos. The AI model is designed to learn from numerical data and physical simulations in various scientific fields, helping to model phenomena such as giant stars and the Earth's climate.
Models developed through this initiative can then be used as a starting point and refined by scientists for specific applications.
Interdisciplinary research on Polymathic AI
Miles Cranmer, co-initiator of the project and a member of the Department of Applied Mathematics and Theoretical Physics and the Institute of Astronomy at the University of Cambridge, explains that one of the main problems with academic work using foundation models is the computational cost. However, he says, the collaboration with the Simons Foundation provides the resources necessary to test such models for scientific research.
Co-initiator Siavash Golkar, a visiting scientist at the Flatiron Institute's Center for Computational Astrophysics, sees Polymathic AI as a tool to help scientists discover connections between different disciplines. The AI is designed to bring together data from different disciplines to overcome the challenge of staying current in multiple disciplines.
François Lanusse, co-initiator and cosmologist at the Centre national de la recherche scientifique (CNRS) in France, emphasizes that, unlike existing AI tools, Polymathic AI will not be limited to specific use cases and data. Instead, it will use data from various sources and domains and apply its multidisciplinary knowledge to a wide range of scientific problems.
Shirley Ho emphasizes the importance of transparency and openness in the Polymathic AI project, saying that the goal is to democratize AI for science and provide a pre-trained model that can improve scientific analysis of various issues and domains.
Very happy to start this initiative @PolymathicAI with our amazing team members and scientific advisors!
The mission: Building #opensource AI models trained across disciplines that deliver scientific discoveries 💪
— Shirley Ho (@cosmo_shirley) October 10, 2023
Meta attempted a similar project with its open-source AI model Galactica, which was optimized for scientific tasks but relied primarily on language for training. The model was so heavily criticized by other researchers for the misinformation it generated that Meta took it offline after a few days.