Content
summary Summary

Metas Nougat is an AI text recognition model that can reliably convert scientific PDFs to text.

Ad

Researchers at Meta have unveiled Nougat (Neural Optical Understanding for Academic Documents), an AI model that converts PDF images of scientific articles into structured, machine-readable text. Nougat aims to bridge the gap between human-readable PDF documents and machine-readable text, improving access to scientific knowledge.

Based on a variant of Vision Transformer for image analysis, Nougat performs optical character recognition (OCR) tailored for processing scientific documents. Unlike traditional OCR engines, which work line-by-line, Nougat processes the entire page. According to the team, this makes it easier to handle features such as superscripts and subscripts in mathematical formulas, which have often been transcribed incorrectly in the past.

For training, the team used a dataset of PDFs of scientific articles from sources such as arXiv and PubMed Central with the corresponding LaTeX source code from the author(s). The dataset consists of more than 8 million pages.

Ad
Ad

Metas Nougat significantly outperforms existing alternatives

In tests, Nougat achieved high accuracy in extracting text, formulas and tables from pages of scientific articles. For continuous text, it achieved a BLEU score of over 91% and an accuracy of over 96%. Performance for formulas and tables was lower at just over 75%, but still significantly more reliable than alternatives such as GROBID, whose accuracy for mathematical formulas is just under 11%.

According to Meta, Nougat is a promising solution for improving access to scientific knowledge by converting PDF research papers into structured, machine-readable text. This could make millions of scientific articles more accessible by bridging the gap between PDF and text.

However, challenges remain in managing cross-document consistency and avoiding repetitive text loops during generation, the team says.

The code and models are available on GitHub and are intended to accelerate future work in scientific document processing. More information and examples are available on the Nougat's project page.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Meta researchers developed Nougat, an AI model that reliably converts scientific PDFs into machine-readable text, improving access to scientific knowledge.
  • Based on vision transformers, Nougat achieves high accuracy in extracting text, formulas, and tables from scientific articles. It significantly outperforms existing alternatives.
  • Code and models are available on GitHub.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.