The Atlantic has developed a search tool that lets users check if their work appears in LibGen, a massive archive of pirated books, scientific papers, and articles that was reportedly used to train language models. According to court documents, Meta used the LibGen dataset to train its Llama models. OpenAI told Gizmodo that LibGen content is not included in the current versions of ChatGPT or in OpenAI's API. Other AI companies have not yet commented on whether they used LibGen data in their training. Microsoft recently began offering book licensing deals to publishers.
Ad
