Meta's latest AI model is designed to automatically check hundreds of thousands of Wikipedia sources simultaneously and suggest better sources when needed.
Wikipedia grows by around 17,000 articles per month. Among them are long and short texts, but they have one thing in common: Every claim in the text must be verified by a double-checked source.
The bigger Wikipedia gets, the more difficult it becomes for the community to fulfill this promise of quality and keep the sources up to date. Meta's latest AI model "Side" could support this in the future.
Knowledge sources from the WWW
According to Meta, the open-source AI model Side can automatically check hundreds of thousands of source references and evaluate whether and how well the information available in the source verifies the claim in the Wikipedia article.
The system is designed to check the sources automatically in the background and alert Wikipedia editors to sources that may be incorrect or irrelevant. This is to save the editor the trouble of checking all sources manually. Side can also make suggestions for (better) sources if citations are missing or outdated.
Side draws its knowledge from a text dataset containing information from 134 million publicly available Web pages. According to Meta, the indexes developed for the Sphere project contain 40 times more content than other Wikipedia indexes.
During training, Meta taught the AI to pick out a single source for each of four million Wikipedia statements from the vast pool of Web pages. When searching for sources, Meta says the models "create and compare mathematical representations of the meanings of entire statements rather than of individual words."
In the case of long texts, the model should thus be able to find only the passages most relevant to the Wikipedia passage before recommending a source URL.
In the long term, Meta also wants to check facts and evaluate source quality
In the next step, Meta hopes to extend the verification principle. Corresponding models should learn to evaluate the quality of retrieved documents, they should recognize possible contradictions in statements, prioritize trustworthy sources, and transparently indicate when there is no convincing evidence for a statement.
"In the real world, these models could muzzle harmful content and, when combined with a well-designed UI, enhance people’s digital literacy and critical thinking skills," Meta writes.
The system could, for example, be integrated into future editorial software to check facts, correct errors and add text. It would be optimal if it could access information from all media formats in many languages, the researchers write.
Open source experiment as a building block for AI future
Meta's research team also provides an outlook on how the Side project could contribute to AI progress overall: Via extensive training with complex, complicated content, AI could develop a better understanding of the world. The result would be smarter and more flexible algorithms.
As a pre-trained model, the fact-checking system could bring advances in language processing, information retrieval in question-answering systems, and few-shot learning, i.e., fine-tuning a large AI model to specific applications with limited data.
Meta makes Side freely available as open source. You can check out a Wikipedia demo here. Meta has not partnered with Wikipedia operator Wikimedia for Side. It is not known if and how the system will be used for Wikipedia.