Content
summary Summary

A large-scale study involving over 100 natural language processing researchers reveals that AI-generated research ideas are considered more novel than those from human experts. However, the AI-generated ideas may face challenges in terms of feasibility.

Ad

Stanford University researchers conducted a carefully controlled comparative study to examine whether large language models can produce novel research ideas comparable to those of human experts. The study involved more than 100 highly qualified researchers in the NLP field.

AI ideas seen as more innovative but impractical

Nearly 300 evaluations across all experimental conditions showed that AI-generated ideas were consistently rated as more novel than human-generated ideas. This finding remained robust even after multiple hypothesis corrections and various statistical tests.

Chart: Comparison of novelty ratings for ideas from humans, AI, and AI with human revision across 7 NLP topics.
AI ideas with human revision achieved the highest novelty scores, highlighting the potential of human-AI collaboration. However, further research on the feasibility of these ideas is needed. | Image: Si et al.

However, the study suggested that the increased novelty may come at a slight cost to feasibility. The sample size was not large enough to definitively confirm these effects.

Ad
Ad

Potential drawbacks of AI-generated ideas

The study identified several recurring issues with AI-generated research ideas:

1. Lack of implementation details
2. Incorrect use of datasets
3. Missing or inappropriate benchmarks
4. Unrealistic assumptions
5. Excessive resource requirements
6. Insufficient motivation
7. Inadequate consideration of existing best practices

In contrast, human-generated ideas tended to be more grounded in existing research and practical considerations, though possibly less innovative. Human ideas often focused on common problems or datasets and prioritized feasibility over novelty.

Study methods and future directions

The study used GPT-3.5, GPT-4, and Llama-2-70B to generate AI ideas, with external source retrieval via RAG. To minimize bias, researchers standardized the style of human and AI ideas and aligned topic distributions. They did not test more advanced models like GPT-4o, Llama 3 or o1.

The research team proposed several approaches to build on their findings: Comparing AI ideas with accepted papers from top conferences, having researchers develop both AI and human ideas into complete projects and exploring the automation of idea execution through code-generating AI agents.

Recommendation

Existing examples of AI contributions to research include Google's AI-accelerated chips in Pixel smartphones and applications in medicine.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • A Stanford University study of over 100 NLP researchers found that AI-generated research ideas were rated by experts as significantly more novel than ideas from human experts, but possibly at the expense of feasibility.
  • The most common weaknesses of AI ideas were vague implementation details, incorrect use of data sets, lack of benchmarks, unrealistic assumptions, and insufficient consideration of existing best practices.
  • Human ideas were more grounded in existing research, but less innovative. The researchers are planning further research to deepen their findings, for example by comparing AI ideas with accepted papers from a top conference.
Sources
Jonathan works as a freelance tech journalist for THE DECODER, focusing on AI tools and how GenAI can be used in everyday work.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.