Here is how to block OpenAI from using your web content for ChatGPT

Midjourney prompted by THE DECODER

OpenAI's GPTBot crawls the web for content that can be used by AI models. If you do not want this, you can block the bot.

The content that GPTBot visits can be used to improve future AI models, according to OpenAI. Those who give GPTBot access to their content are helping to make AI models more accurate, capable, and safe, the company writes.

Block GPTBot from crawling your site

If you do not want to share your content with OpenAI's models for free, you can block GPTBot. By configuring "User-agent: GPTBot," you can either block the bot from visiting your site altogether or from visiting individual folders or categories on your site. Similar to blocking a Google crawler, you can control GPTBot by adding it to your robots.txt with the following commands

User agent token: GPTBot
Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

User-agent: GPTBot
Disallow: /

Example:
User-agent: GPTBot
Allow: /directory-1/
Disallow: /directory-2/

According to OpenAI, content behind paywalls, pages that request personal identification information, or that violate OpenAI's content guidelines are automatically filtered out. Full instructions are available here.

ChatGPT and the Content Dilemma

With the launch of ChatGPT's web browsing feature, OpenAI announced that website owners such as publishers could block the crawling bot if they did not want their content to be used within or for ChatGPT.

Blocking the bot, however, means not being present in a potentially emerging content ecosystem - a dilemma similar to (non-)indexing in Google search, where content providers inadvertently become both suppliers to and financially dependent on a third-party ecosystem.

Recommendation

AI in practice

Is OpenAI's brain drain a sign of AI winter or just bad management?

In the case of chatbots, however, the starting position for content providers is even less favorable: While search engines are (largely) designed to direct searchers to sites where they can provide value to the site operator, chatbots are optimized to provide searchers with the most direct and comprehensive answers possible directly in chat. This almost exclusively benefits the provider of the chatbot.

OpenAI does not currently offer web browsing, following the discovery that ChatGPT browsing could partially read content behind paywalls and pull it into the chat for free. It is not known when the browsing plugin will be back online. Perhaps OpenAI is concerned about further legal repercussions for the reasons mentioned above.

Meta, Microsoft and Google also train their chatbots with copyrighted material and pull content from websites into their chatbots without further consent. They are reportedly in talks with publishers to charge billions for the use of their content.

So far, major chatbot providers like Microsoft have paid lip service, at best, to keep the web ecosystem open. Google's new AI search is designed to keep users in the Google ecosystem much longer than traditional web search.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Here is how to block OpenAI from using your web content for ChatGPT

Block GPTBot from crawling your site

ChatGPT and the Content Dilemma

Is OpenAI's brain drain a sign of AI winter or just bad management?

Code is just a lossy projection of intent, according to OpenAI researcher Sean Grove

OpenAI delays release of open-weight model indefinitely over safety concerns

OpenAI’s head of ChatGPT says AI will not displace doctors but will displace not going to the doctor

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Musk unveils Grok 4 as xAI’s new AI model that beats OpenAI and Google on major benchmarks

Here is how to block OpenAI from using your web content for ChatGPT

Block GPTBot from crawling your site

ChatGPT and the Content Dilemma

Share

Bank details