Content
summary Summary

A new study reveals major problems with how AI search engines handle news citations, even when they have formal agreements with publishers.

Ad

While nearly 25% of Americans now use AI search engines instead of traditional tools, according to recent data, these systems often fail at basic source attribution. Research from Columbia University's Tow Center for Digital Journalism tested eight AI search engines, including ChatGPT, Perplexity, and Google Gemini, by asking them to identify headlines, sources, publication dates, and URLs from random news articles.

The results paint a concerning picture: more than 60% of queries received incorrect answers. Perplexity emerged as the top performer with a 37% error rate, while Grok 3 struggled significantly, misattributing 94% of citations.

Horizontal bar chart: performance of various generative search tools in identifying the source article and URL for text excerpts, broken down by accuracy of response.
The study found very few instances where the AI tools provided completely accurate attributions. | Image: Columbia Journalism Review

Paid services perform worse than free versions

Surprisingly, paid services like Perplexity Pro and Grok 3 performed worse than their free counterparts. While they attempted to answer more queries, they were more likely to provide incorrect information instead of acknowledging when they didn't know something.

Ad
Ad
Bar chart on the performance of generative search tools: correct, incorrect and unidentified source references for extracted text snippets.
The graph illustrates the hit rates of various generative search tools in correctly identifying source articles and their metadata. | Image: Columbia Journalism Review

Several systems also ignored publishers' Robots Exclusion Protocol settings. For example, Perplexity accessed National Geographic content despite the publisher explicitly blocking its crawlers.

Matrix diagram: Accessibility of web content for various search engines and crawling methods despite blocking.
Access to content varies significantly depending on the search engine and method used. | Image: Columbia Journalism Review

Publisher agreements don't fix attribution issues

Even formal partnerships between publishers and AI companies haven't resolved the attribution problems. Despite Hearst's agreement with OpenAI, ChatGPT only correctly identified one in ten San Francisco Chronicle articles. Perplexity frequently cited syndicated versions of Texas Tribune articles instead of originals.

The study found that AI search engines often directed users to syndication platforms like Yahoo News rather than original sources. In more than half of cases, Grok 3 and Google Gemini created URLs that didn't exist.

Bar chart: Accuracy of generative search tools for identifying the origin, publication and URL of articles based on license agreements.
Data shows that formal licensing agreements haven't improved the accuracy of content attribution. | Image: Columbia Journalism Review

Time Magazine's COO Mark Howard notes that AI companies are working to improve their systems but cautions against expecting perfect accuracy from current free services: "If anybody as a consumer is right now believing that any of these free products are going to be 100 percent accurate, then shame on them."

A separate BBC study in February identified similar problems with AI assistants handling news queries, including factual errors and poor sourcing.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • A recent study by Columbia University's Tow Center for Digital Journalism evaluated eight prominent AI search engines, including ChatGPT, Perplexity, and Google Gemini, to assess their accuracy in citing news content.
  • The results revealed significant limitations, with the search engines providing incorrect answers to over 60% of the queries posed.
  • Contrary to expectations, paid premium models performed worse than their free counterparts, and several chatbots appeared to ignore the Robots Exclusion Protocol settings, often directing users to syndication platforms rather than the original sources.
Jonathan writes for THE DECODER about how AI tools can make our work and creative lives better.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.