Ad
Skip to content

AI-hallucinated citations are creeping into papers that shape clinical guidelines, researchers warn

Image description
Nano Banana Pro prompted by THE DECODER

Key Points

  • Researchers at Columbia University and other institutions show in a study that the rate of fabricated references in biomedical papers has increased more than twelvefold since 2023.
  • The authors see language models like ChatGPT as a likely cause. The fake sources look deceptively real and are especially risky because they often show up in review articles that shape clinical guidelines.
  • As a countermeasure, the researchers call for automated reference checks before publication and retroactive screening of already-published papers. Platforms like Arxiv have already introduced initial sanctions for AI-related errors.

An audit of 2.5 million biomedical papers shows that made-up references in peer-reviewed research have become a systemic issue. Since 2023, the rate has increased more than twelvefold.

Researchers at Columbia University and other institutions have published the largest-ever review of citations in biomedical papers in The Lancet. The team, led by Maxim Topaz, scanned 2.47 million papers from the open PubMed Central archive published between January 2023 and February 2026.

Out of 97.1 million references checked, 4,046 were flagged as fabricated, spread across 2,810 papers. A reference counted as fabricated if its listed title couldn't be found in any of four major databases: PubMed, Crossref, OpenAlex, and Google Scholar.

Flat through 2023, then a sharp spike

The timeline tells the story. Throughout 2023, the rate held steady at about four fabricated references per 10,000 papers. Starting in mid-2024, it climbed fast, hitting 51.3 per 10,000 by the end of 2025 and reaching 56.9 per 10,000 in the first seven weeks of 2026. That's more than twelve times the baseline.

Ad
DEC_D_Incontent-1

The authors suspect an obvious link to the widespread use of language models like ChatGPT, which took off in late 2022. Since papers typically take 100 to 200 days from submission to publication, AI-generated text wouldn't show up in PubMed Central in large numbers until mid-2024. The authors don't rule out other causes, though, including increased paper-mill activity or changes in indexing practices.

From summer 2024, there was a rapid increase in hallucinated references in the papers examined. Image: The Lancet

The real problem: these fake references are hard to spot. They match the paper's topic, follow correct formatting, credit real researchers, and carry plausible publication years. In one urology paper, 18 of 30 checked references were fabricated while all closely matched the narrow surgical subject.

The researchers also found patterns pointing to coordinated paper-mill activity. Two authors appeared in eleven papers from the same surgical journal, with a total of 15 fabricated references on topics like CRISPR diagnostics and the gut microbiome.

Scientific infrastructure needs to catch up with AI

At the time of the audit, 98.4 percent of the affected papers had received no response from their publishers. Review articles were hit hardest, showing a 57 percent higher fabrication rate than other paper types. That's especially worrying, the authors say, because reviews often serve as the basis for clinical guidelines. If a guideline cites a paper with partly fabricated sources, the entire evidence chain behind treatment decisions is compromised.

Ad
DEC_D_Incontent-2

The scientific community has started adapting, but the response remains patchy. Arxiv tightened its sanctions for unchecked LLM output in manuscripts, including hallucinated sources, threatening offending authors with a one-year ban. An analysis of accepted NeurIPS 2025 papers had already shown that even top AI conferences can't reliably catch fabricated citations. One possible countermeasure is CiteAudit, an open-source system for automated citation checking, though it also shows how poorly commercial language models do at catching their own reference problems.

The researchers recommend four steps: automated reference checks before peer review, integrity metadata in article datasets, retroactive screening of already-published papers, and a dedicated "fabricated references" category in research integrity databases. The authors themselves used Claude for code development and grammar checking during the study.

AI News Without the Hype – Curated by Humans

Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.

Source: The Lancet