Generative AI needs a court ruling on fair use of training data to stop all these lawsuits

Midjourney prompted by THE DECODER

A group of authors led by Pulitzer Prize winner Michael Chabon has filed suit against Meta and OpenAI in federal court in San Francisco. Another, you might rightfully ask.

The allegations are the same as in the pending lawsuits: direct and vicarious copyright infringement, removal of copyright information, unfair competition, and negligence.

The authors allege that their copyrighted works have been included in the training material of the respective AI systems without authorization, specifically in the so-called book datasets.

According to the filing, there is no evidence of this, but there is "information and belief". The plaintiffs cite ChatGPT's ability to write detailed summaries of their books as an indication that they were part of the training data.

However, these summaries could also be based on summaries from the Internet and would not necessarily mean that the system was trained with the complete books.

OpenAI seeks court clarification on fair use

In a response to a nearly identical lawsuit, OpenAI did not deny (but did not confirm) that the plaintiffs' books were part of the training material. Instead, the company argued that it was making fair use of the data to develop new products. This is allowed under copyright law, even without the authors' consent.

However, OpenAI denied all other allegations, such as that the copyright notices had been removed, presumably with the intention of going to court and obtaining a resolution of the most fundamental question that could make similar AI litigation obsolete: Is it fair use to use copyrighted works to train artificial intelligence?

With these lawsuits against Meta, OpenAI, and Google Deepmind, and similar unanswered questions surrounding the use of code and images for AI training, it is clear that this question needs a fundamental court ruling for everyone to move forward.

To provide peace of mind, Microsoft even offers to cover any legal costs for its customers should a lawsuit arise as a result of working with its generative AI offerings.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Recommendation

AI in practice

Generative AI needs a court ruling on fair use of training data to stop all these lawsuits

OpenAI seeks court clarification on fair use

Meta takes on OpenAI's GPT-4o with Llama 3 405B, its largest open-source LLM to date

Google brings Gemini for Education and Gemini in Classroom AI tools to schools

After Meta's recruiting push, OpenAI tries to retain talent

LLM search optimization seems to mirror strategies used in classic SEO, study finds

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

New Othello experiment supports the world model hypothesis for large language models

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

Generative AI needs a court ruling on fair use of training data to stop all these lawsuits

OpenAI seeks court clarification on fair use

Share

Bank details