Stanford University researchers have developed STORM, an AI system that automates the preparation phase of writing Wikipedia-like articles. The system independently researches a topic, gathers sources, and creates a detailed outline.
Writing long, well-researched articles like those on Wikipedia is challenging even for experienced authors. Before the actual writing process can begin, thorough research and planning are necessary in the preparation phase.
Stanford University researchers have now developed STORM, an AI system that automates this preparation phase. STORM stands for "Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking." It breaks down the task into two steps: First, it researches a topic, collects references, and creates an outline. Then it uses the outline and references to write the full article.
The core of STORM is a mechanism that prompts the AI language model to ask effective questions to research a topic. STORM uses two strategies for this:
- Perspective-driven questioning: STORM discovers different perspectives by analyzing Wikipedia articles on similar topics. These perspectives then serve as prior knowledge to generate more targeted questions.
- Simulated conversation: STORM simulates a dialogue between a Wikipedia author and an expert on the topic. The expert's answers are based on "trustworthy internet sources" provided by the AI search engine you.com. This allows the language model to iteratively update its understanding of the topic and ask follow-up questions.
Based on the collected knowledge and the language model's internal knowledge, STORM then creates a detailed outline. This is then formulated section by section into a complete article. The system is reminiscent of Perplexity Pages.
Limited actuality, but a good overview
To evaluate the system, the researchers created the FreshWiki dataset, which contains current, high-quality Wikipedia articles. They defined metrics to assess the quality of the generated outlines and articles compared to human-written articles.
In an expert evaluation with experienced Wikipedia authors, STORM performed better than a comparison system that generates articles based on search results. The articles produced by STORM were rated as better structured (25% absolute increase) and with broader coverage (10% increase).
However, the expert survey also uncovered new challenges: Sometimes the bias of internet sources is transferred to the generated articles. Additionally, the language model sometimes creates connections between actually independent facts. In initial tests, STORM created a good overview of the topic "What are the current political trends in East Germany?" but forgot to include the current election results of the state elections in Brandenburg, Thuringia, and Saxony.
Overall, the surveyed Wikipedia authors agreed that STORM could help them in the preparation phase when writing new articles. While the quality of the machine-generated texts does not yet reach the level of carefully human-edited articles, the researchers see their system as a promising approach to facilitate and accelerate the creation of well-researched articles. It can certainly be helpful to have AI support in preparing research projects.
However, about 30% of the surveyed Wikipedia editors believe that STORM might not be a useful tool for the Wikipedia community in the future.