Content
summary Summary

An emerging chatbot ecosystem builds on existing web content and could displace traditional websites. At the same time, licensing and financing are largely unresolved.

Ad

OpenAI offers publishers and website operators an opt-out if they prefer not to make their content available to chatbots and AI models for free. This can be done by blocking OpenAI's web crawler "GPTBot" via the robots.txt file. The bot collects content to improve future AI models, according to OpenAI.

Major media companies including the New York Times, CNN, Reuters, Chicago Tribune, ABC, and Australian Community Media (ACM) are now blocking GPTBot. Other web-based content providers such as Amazon, Wikihow, and Quora are also blocking the OpenAI crawler.

According to an analysis by Originality.ai, 9.2 percent of the top 1000 websites were blocking GPTBot at the end of August, with a weekly growth rate of five percent. Out of 759 robots.txt files analyzed, 69 had the block installed. Among the top 100 sites, the blocking percentage is 15 percent.

Ad
Ad
Websites have been able to block GPTBot since the beginning of August, and just under ten percent of the top 1000 websites use this option. | Image: Originality.ai

The largest German news portals Bild.de, t-online.de and n-tv.de have not yet blocked GPTbot. Spiegel Online still allows OpenAI on its site. Other online news portals such as sueddeutsche.de, zeit.de and welt.de have modified their robots.txt to exclude GPTBot. The German public broadcaster SWR also blocks GPTbot.

Chatbots vs. WWW

Blocking the GPTBot is only half the battle: blocking the ChatGPT user agent may be more relevant. This is because ChatGPT plugins like OpenAI's browsing feature use it to access web pages, pull content from a web page into the chat, and discuss it there.

This removes the click-through to the website and thus the monetization - a direct loss for the website operator, even if the content is not stored long-term and used for AI training. So in most cases, anyone who blocks GPTBot should also have an interest in blocking the ChatGPT user agent.

On the other hand, OpenAI is on the retreat in AI browsing anyway. Officially, because it allows paywalls to be circumvented, an unintended side effect. Unofficially, the unresolved rights situation in the direct processing of third-party content probably plays a bigger role.

Nevertheless, Microsoft continues to offer Bing Chat, with slightly reformulated website content in the chat window. Google's AI search, which is currently being tested, also uses similar methods.

Recommendation

None of the major AI companies has yet presented a blueprint for how the WWW content ecosystem will not fall victim to the success of chatbots. So far, company leaders like Microsoft's Satya Nadella have only paid lip service.

The whole legal situation will probably have to be settled in court, most likely between the big publishers and the big AI companies like Google, Microsoft, and OpenAI. The New York Times is said to be preparing a lawsuit against OpenAI that could set the trend for the entire industry.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • OpenAI offers publishers and website operators an opt-out option if they do not want their content to be used for free in the development of chatbots and AI models.
  • This can be done by blocking the web crawler "GPTBot" via robots.txt. Many major media companies have banned GPTBot from their sites.
  • Almost 10 percent of the top 1,000 websites have blocked GPTBot. Of the top 100 sites, 15 percent have implemented a block.
Sources
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.