AI in practice

OpenAI lets ChatGPT loose on the Internet again

Matthias Bastian

Midjourney prompted by THE DECODER

Update
  • The new browsing update ensures that ChatGPT will no longer access sites that have placed a block, even if there is no paywall in front of it.
  • For example, it is no longer possible to access or read the latest news from the New York Times.

ChatGPT can now access the internet again. According to OpenAI, the chatbot is now better at following crawling instructions from web pages.

For paying customers of ChatGPT Plus and Enterprise, web browsing is active again today. The feature will be rolled out to all users "soon".

OpenAI continues to use Microsoft's search engine algorithm for web browsing. To enable it, you need to select "Browse with Bing" in the drop-down menu under GPT-4.

According to OpenAI, chatbot browsing is particularly useful for working with recent content that occurred after September 2021, when ChatGPT's training data ends.

OpenAI says that ChatGPT browsing now follows instructions from websites about what content, if any, ChatGPT is allowed to access. This is done via robots.txt or user agents, an option some publishers are taking. ChatGPT also includes links in its generated responses to the web sources from which it has taken content.

Video: OpenAI

OpenAI improves ChatGPT's web hygiene

OpenAI pulled the browsing feature in early July after launching it in beta in May. Users had found that the language model could bypass publishers' paywalls. A feature that was inadvertently built in, according to OpenAI, which disabled it, claiming it wanted to "do right by content owners."

Yet OpenAI doesn't mention paywall violations in its announcement of the new version of Browsing, even though it was cited as the main reason for the feature's removal. The reference to better robots.txt and user agent compliance is probably an acknowledgement that OpenAI considers the problem solved, or at least improved, without having to explicitly say so, since it's already involved in enough lawsuits for now.

In any case, the reasoning behind taking the feature offline in July seemed pretentious and not well thought out: paywall content is a small part of most publishers' revenue. What counts is traffic to the site as a whole.

ChatGPT potentially undermines the entire traffic-based web text ecosystem if it processes current web text that it pulls from sites for free, but only sends a small fraction of its users to the source of the text.

The same criticism applies to Microsoft's Bing Chat and Google's Search Generative Experience. While all major chatbot providers acknowledge the dilemma, they have yet to offer solutions.

Sources: