Stanford AI experiment "STORM" generates Wikipedia-style articles

Stanford University researchers have developed STORM, an AI system that automates the preparation phase of writing Wikipedia-like articles. The system independently researches a topic, gathers sources, and creates a detailed outline.

Writing long, well-researched articles like those on Wikipedia is challenging even for experienced authors. Before the actual writing process can begin, thorough research and planning are necessary in the preparation phase.

Stanford University researchers have now developed STORM, an AI system that automates this preparation phase. STORM stands for "Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking." It breaks down the task into two steps: First, it researches a topic, collects references, and creates an outline. Then it uses the outline and references to write the full article.

Process of the Standford Experiment "STORM" | Bild: Stanford STORM Project

The core of STORM is a mechanism that prompts the AI language model to ask effective questions to research a topic. STORM uses two strategies for this:

Perspective-driven questioning: STORM discovers different perspectives by analyzing Wikipedia articles on similar topics. These perspectives then serve as prior knowledge to generate more targeted questions.
Simulated conversation: STORM simulates a dialogue between a Wikipedia author and an expert on the topic. The expert's answers are based on "trustworthy internet sources" provided by the AI search engine you.com. This allows the language model to iteratively update its understanding of the topic and ask follow-up questions.

Based on the collected knowledge and the language model's internal knowledge, STORM then creates a detailed outline. This is then formulated section by section into a complete article. The system is reminiscent of Perplexity Pages.

Limited actuality, but a good overview

To evaluate the system, the researchers created the FreshWiki dataset, which contains current, high-quality Wikipedia articles. They defined metrics to assess the quality of the generated outlines and articles compared to human-written articles.

In an expert evaluation with experienced Wikipedia authors, STORM performed better than a comparison system that generates articles based on search results. The articles produced by STORM were rated as better structured (25% absolute increase) and with broader coverage (10% increase).

STORM web interface | Image: The Decoder

However, the expert survey also uncovered new challenges: Sometimes the bias of internet sources is transferred to the generated articles. Additionally, the language model sometimes creates connections between actually independent facts. In initial tests, STORM created a good overview of the topic "What are the current political trends in East Germany?" but forgot to include the current election results of the state elections in Brandenburg, Thuringia, and Saxony.

Overall, the surveyed Wikipedia authors agreed that STORM could help them in the preparation phase when writing new articles. While the quality of the machine-generated texts does not yet reach the level of carefully human-edited articles, the researchers see their system as a promising approach to facilitate and accelerate the creation of well-researched articles. It can certainly be helpful to have AI support in preparing research projects.

Recommendation

AI research

Tencent researchers unleash an army of AI-generated personas for data generation

However, about 30% of the surveyed Wikipedia editors believe that STORM might not be a useful tool for the Wikipedia community in the future.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Stanford AI experiment "STORM" generates Wikipedia-style articles

Limited actuality, but a good overview

Tencent researchers unleash an army of AI-generated personas for data generation

SciArena lets scientists compare LLMs on real research questions

Microsoft’s MAI-DxO boosts AI diagnostic accuracy and cuts costs by nearly 70 percent

Researchers say they may have found a ladder to climb the "data wall"

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

New Othello experiment supports the world model hypothesis for large language models

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

Stanford AI experiment "STORM" generates Wikipedia-style articles

Limited actuality, but a good overview

Share

Bank details