Content
summary Summary

Perplexity AI is under scrutiny for possible copyright infringement and questionable data collection practices.

According to WIRED, Amazon Web Services (AWS) has initiated an investigation into whether Perplexity is violating its terms of service. The main concern is that Perplexity allegedly crawls websites and uses their content despite explicit prohibitions against such use.

WIRED reports that Perplexity ignores the Robots Exclusion Protocol, a web standard that allows websites to block access by automated bots. While not legally binding, compliance with the robots.txt file is included in the terms of service of cloud providers such as Amazon.

But Perplexity's bot regularly scraped content from WIRED, even though WIRED's publisher, Condé Nast, had explicitly blocked the bot. In some cases, Perplexity reproduced WIRED's content verbatim, according to the magazine.

Ad
Ad

The bot sometimes uses special URLs to access paywalled content. A Perplexity spokesperson acknowledged this occurs, but claims it happens "very infrequent" and only when a user explicitly enters the URL in the prompt, stating that Perplexity's bot does not violate AWS' terms of service.

OpenAI's ChatGPT faced similar criticism for bypassing paywalls, leading to a temporary shutdown to address the issue. OpenAI is now signing deals with media companies to display their fresh content on ChatGPT. Perplexity, despite substantial funding and a very high valuation, may struggle to afford similar agreements.

Perplexity CEO Aravind Srinivas defended his startup, saying that the criticized crawlers came from a third-party company that provides crawling and indexing services. He claimed it would be "complicated" to stop these services, but declined to name the company Perplexity works with, citing a non-disclosure agreement.

Perplexity recently launched "Pages", a product that automatically aggregates content from multiple sources, compiles it into a landing page, and has it indexed by Google to compete with original content.

Following the release of Pages, criticism of Perplexity intensified after Forbes revealed that Perplexity was plagiarizing its content. Previously, Perplexity CEO Aravind Srinivas told Forbes that his tool does nothing different from other news sites that quote journalistic primary sources. This suggests that Srinivas understands neither journalism nor his tool.

Recommendation

Perplexity is reportedly planning to offer publishers a revenue share in the future.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • AI startup Perplexity AI is being criticized for possible copyright infringement and questionable data collection practices for its "answer engine". Amazon Web Services has launched an investigation into whether Perplexity is violating its terms of service.
  • The investigation centers on allegations that Perplexity crawls Websites and uses their content even though the sites specifically prohibit such use. Perplexity allegedly disregards the Robots Exclusion Protocol, a web standard for blocking bots.
  • With Pages, Perplexity unveiled a product that automatically collects content from multiple sources, aggregates it into landing pages, and indexes it on Google, where it competes with original content. CEO Aravind Srinivas compared this to the work of news sites, showing a lack of understanding of journalism and the company's own tool.
Sources
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.