Researchers find AI text is making the internet more uniform and weirdly cheerful
A large-scale analysis of websites from the Internet Archive shows just how much AI text already saturates the web. According to the researchers, though, the actual effects look quite different from what the public assumes.
About 35 percent of all newly published websites were fully or partially AI-generated by mid-2025. That's the headline finding of a study by researchers at Imperial College London, the Internet Archive, and Stanford University. Before ChatGPT launched in late 2022, that share was essentially zero.
The team pulled a representative sample of English-language websites from the Internet Archive's Wayback Machine, covering 33 monthly intervals from August 2022 to May 2025. To spot AI text, they used the Pangram v3 detector, which came out on top in their own robustness tests across five dimensions.
The researchers put six common hypotheses about AI's impact on the web to the test. Only two held up statistically: "semantic contraction" and the "positivity shift."
Semantic contraction refers to a narrowing of the range of ideas online. The study found that AI-generated texts were 33 percent more semantically similar to each other than human-written content. The researchers take this as a sign that language models gravitate toward the mean of their training data, potentially shrinking the "Overton window" of online discourse.
The positivity shift shows up as an increasingly artificial upbeat tone. AI texts scored 107 percent higher on positive sentiment than fully human-written content. The researchers chalk this up to the well-known tendency of language models toward sycophancy and overoptimism. A discourse dominated by sanitized, relentlessly cheerful prose could push human dissent to the margins, they argue. Co-author Jonas Dolezal, an AI researcher at Stanford, wants AI models to have more friction and a sharper voice. "Rather than forcing models to be perfectly compliant and agreeable, allowing them to have a more distinct personality or 'friction' might help them act as a creative partner rather than a replacement for human voice," he told 404 Media. The study measures correlations, not causation.
No evidence of more factual errors online
Four other hypotheses didn't hold up: there was no disappearance of individual writing styles, no decline in external links, and no drop in information density. The study also couldn't show an increase in factual errors, though that finding rests on much shakier methodological ground than the others.
To test the so-called truth decay hypothesis, the researchers had GPT-4o-mini automatically pull verifiable claims from the websites, up to five per page. Fifty human annotators then checked those claims against outside sources, rating them as supported, refuted, not enough evidence, or conflicting evidence. The metric was the share of clearly refuted statements. The researchers found no statistically significant correlation with the share of AI content.
But this result rests on a fairly narrow base: each annotator checked claims from five articles, which works out to a subsample of roughly 250 websites. Compared to the roughly 10,000 URLs per month across 33 months underlying the full study, that's a tiny slice. The method also captures only a narrow kind of truth decay: clearly refutable individual claims. Subtler forms, such as vague, suggestive, or simply unverifiable assertions, which are likely common in AI text, slip right through. And because an AI model decides upfront which statements count as "verifiable" and get sent to annotators, the test skews conservative.
"The most surprising result was that our Truth Decay hypothesis wasn't confirmed," Dolezal told 404 Media. "It's worth noting that we were specifically looking for an increase in verifiably untrue statements, which we didn't find. But it could still be the case that AI is quietly increasing the volume of unverifiable claims, ones that can't be checked against existing fact-checking tools and infrastructure."
The researchers conclude that the real threat isn't outright falsehoods but a creeping shift in how people relate to online information. As AI text becomes ubiquitous and nearly indistinguishable from human writing, users may start writing off the credibility of online information across the board. The study calls this "reality apathy."
Public perception doesn't match the data
The researchers also surveyed 853 US adults in a representative poll. Most respondents believed in all of the negative hypotheses, including the four that didn't hold up empirically. For example, 83 percent agreed that individual writing styles are vanishing in favor of a generic AI voice. The data didn't back that up either.
People who rarely use AI were more likely to believe in negative effects than regular users (88.3 versus 76.2 percent), according to the study. Among AI skeptics, the gap was even wider (91.3 versus 71.1 percent).
The researchers warn that the high share of AI content turns the theoretical risk of "model collapse," where AI models degrade by training on their own outputs, into a practical problem. Instead of relying on after-the-fact detection, they recommend cryptographic provenance standards like C2PA, plus a rethink of search and recommendation algorithms to reward semantic diversity.
Co-author Maty Bohacek of Stanford says the team is already working with the Internet Archive to turn the analysis into an ongoing monitoring tool that tracks the share of AI content on the web over time. "We're now working with the Internet Archive to turn this into a continuous tool that keeps providing this signal going forward, rather than a single fixed snapshot bounded by the static nature of a paper," Bohacek told 404 Media.
The study has limits the researchers acknowledge themselves. Only English-language texts were analyzed; other languages and formats like images or video were left out. The entire analysis hinges on the reliability of the Pangram v3 detector, and its accuracy could change as language models keep evolving. The data also comes only from the Internet Archive, which doesn't represent the whole web.
AI News Without the Hype – Curated by Humans
Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.
Subscribe nowRead on for the full picture.
Subscribe for hype-free coverage.
- Access to all THE DECODER articles.
- Read without distractions – no Google ads.
- Access to comments and community discussions.
- Weekly AI newsletter.
- 6 times a year: “AI Radar” – deep dives on key AI topics.
- Up to 25 % off on KI Pro online events.
- Access to our full ten-year archive.
- Get the latest AI news from The Decoder.