Content
summary Summary

Johns Hopkins University and AMD have developed Agent Laboratory, a new open-source framework that pairs human creativity with AI-powered workflows.

Ad

Unlike other AI tools that try to come up with research ideas on their own, Agent Laboratory focuses on helping scientists carry out their research more efficiently.

"We hope Agent Laboratory enables researchers to allocate more effort toward creative ideation rather than low-level coding and writing, ultimately accelerating scientific discovery," the researchers write.

Pixelart visualization of a research process: virtual laboratory with several agents, timeline from literature research to report generation and accompanying documents.
The Agent Laboratory enables a fully automated research process from literature search to report generation. Multiple AI agents work together in a virtual lab setting to conduct and document scientific research. | Image: Schmidgall et al.

How the Virtual Lab gets work done

The process follows the typical path of academic research. It starts with a PhD agent that digs through academic papers using the arXiv API, gathering and organizing all the relevant research for the project.

Ad
Ad

From there, PhD and postdoc agents team up to build a detailed research plan based on what they learned from the literature. Through ongoing discussions, they map out exactly what needs to happen to test the researcher's ideas.

Finally, an ML-Engineer agent rolls up its sleeves and does the technical work, using a specialized tool called mle-solver to create, test, and fine-tune machine learning code.

Process diagram: Three-phase research workflow with AI agents for literature research, experiments and report generation, including tools and role allocation.
Specialized tools such as mle-solver and paper-solver automate complex research tasks from literature search to report generation. | Image: Schmidgall et al.

When the experiments are complete, PhD and professor agents work together to write up the findings. Using a tool called paper-solver, they generate and refine a comprehensive academic report through several iterations until it clearly presents the research in a format humans can easily understand.

The researchers published a sample thesis and documented all the specific prompts used throughout the research process in the appendix of their paper.

Human reviewers prefer o1-preview

When human reviewers looked at papers produced by Agent Laboratory, they found that different AI models produced different results. OpenAI's o1-preview model came out on top overall, especially for clarity and validity, while o1-mini earned the highest marks for experimental quality.

Recommendation

AI reviewers and human reviewers saw things quite differently. The AI consistently gave scores about 2.3 points higher than humans did, particularly when it came to how clear and well-presented the papers were.

Two tables show NeurIPS evaluation criteria in comparison: Automated vs. human reviewer scores for different quality aspects of scientific papers.
The automated reviewers rated the generated papers an average of 2.3 points higher than human reviewers. | Image: Schmidgall et al.

The system also lets researchers work alongside the AI in what's called co-pilot mode. While this approach typically scored better than fully automated papers, it sometimes came at the cost of experimental quality and usefulness.

Bottom line

The researchers found that Agent Laboratory can produce papers quite cheaply - just $2.33 per paper when using GPT-4o. Among the different AI models tested, GPT-4o offered the best balance of performance and cost, while o1-preview achieved similar success rates but took longer and cost more.

Three tables compare the costs, time required, and success rates of various AI models in Agent Laboratory over different workflow phases.
GPT-4o achieves the highest overall performance at a lower cost, while o1-preview achieves similar success rates at a significantly higher cost. | Image: Schmidgall et al.

The team acknowledges several limitations: the AI's tendency to overrate its own work, the constraints of automated research, and the risk of generating incorrect information.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

While progress in developing more capable large language models seems to have slowed recently, researchers and companies are shifting their focus to creating agent frameworks that connect multiple LLMs and tools. These frameworks often mirror the structures and workflows of human organizations, whether for conducting focus groups or translating long documents.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Agent Laboratory, an open-source framework developed by AMD and Johns Hopkins University, combines human ideation with AI-driven workflows to accelerate machine learning research.
  • The workflow is divided into three main phases: literature search by a PhD agent, creation of a detailed research plan by PhD and postdoc agents, and implementation and execution of experiments by an ML engineer agent.
  • Human ratings showed variability in performance across different large language models (LLMs), with o1-preview being perceived as the most useful in the Agent Laboratory framework.
Sources
Jonathan works as a freelance tech journalist for THE DECODER, focusing on AI tools and how GenAI can be used in everyday work.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.