A scientific paper generated by Sakana AI's system passed peer review at an AI workshop before being withdrawn as planned.
The paper represents the first fully AI-generated research to complete a standard review process, according to the company. Sakana AI had previously introduced the system's predecessor in August last year.
The experiment was conducted in collaboration with organizers of the International Conference on Learning Representations (ICLR) workshop. Of three submitted AI-generated papers, one achieved an average rating of 6.33 - just above the workshop's acceptance threshold.
The accepted paper, titled "Compositional Regularization: Unexpected Obstacles in Enhancing Neural Network Generalization," examined regularization methods for neural networks and reported negative research findings.
Understanding the AI research process
The AI Scientist-v2 independently developed the scientific hypothesis, proposed experiments, wrote code, conducted the research, analyzed data, and authored the manuscript. Human researchers only provided the topic and selected the most promising papers for submission.
However, the paper was only accepted at the workshop level, not for the main conference. Workshops typically have much higher acceptance rates of 60-70% compared to main conferences at 20-30%. Sakana AI acknowledged that none of the three papers would have met the internal criteria for acceptance at the main ICLR conference in their current form.
Planned withdrawal and identified errors
Following a pre-established agreement, the paper was withdrawn after completing peer review. This decision was part of the experimental protocol, since the scientific community hasn't yet developed established standards for handling AI-generated manuscripts.
During their internal review, researchers found that The AI Scientist v2 occasionally made citation errors. For example, it incorrectly attributed "LSTM-based neural network" to Goodfellow (2016) instead of the correct authors Hochreiter and Schmidhuber (1997).
These issues demonstrate that Sakana's system still exhibits the common limitations of modern language models.