AI copyright far from settled as Japan and Israel stake out early positions

Midjourney prompted by THE DECODER

What data can be processed to train AI models? Japan and Israel are staking out initial positions, but like everything else on this topic, these are still in the early stages.

Large language and image models are trained on a huge amount of data from the Internet. Much of this data is copyrighted and has not been explicitly released for AI model training.

As a result, there has been a debate about the legal viability of such models, especially in the fields of design and art, and since the advent of widely available image generators such as Stable Diffusion.

Japanese law supports generative AI

In a hearing with Japanese politician Takashi Kii in late April, Japan's Minister of Education, Culture, Sports, Science and Technology, Keiko Nagaoka, confirmed that existing Japanese law allows the use of data collected on the Internet for both non-commercial and commercial purposes. She said this in response to his question about potential copyright issues with generative AI.

While this is not an explicit endorsement of the legitimacy of large AI models trained on copyrighted data, it is a snapshot of existing Japanese law. Takashi Kii expressed at this meeting that he believes new copyright rules are needed, adapted to the AI era. So this is far from being resolved.

Kii also said that Japan does not yet have rules for dealing with generative AI in an educational context.

Israel's Ministry of Justice weighs in on copyright and AI training data

A more specific position paper published by the Israeli Ministry of Justice in 2022 (via Project Disco) states that "typically" the fair use doctrine applies to AI training data from the web, and that some projects may fall under a doctrine that allows "incidental use of copyrighted material" if the copyrighted works are deleted at the end of the training process.

Excluded from this approach are datasets that are specifically trained on the works of individual creators to compete with them. For example, imagine an AI system trained exclusively on Harry Potter novels to generate more.

In addition, the statement refers only to the training and not to the output of the systems, which could infringe copyrights regardless of the training process, the Ministry of Justice notes.

Recommendation

AI in practice

OpenAI unveils o3, its most advanced reasoning model yet

Another special case in the copyright debate is likely to be chatbots, such as those from Microsoft, OpenAI, and Google, which scan web content in real-time and present it in a slightly modified form, e.g. as a search result.

This copyright debate is separate from the debate over copyrighted material in training datasets, although publishers are likely to try to assert any rights they may have if their works are used for AI training or generation without their permission.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

AI copyright far from settled as Japan and Israel stake out early positions

Japanese law supports generative AI

Israel's Ministry of Justice weighs in on copyright and AI training data

OpenAI unveils o3, its most advanced reasoning model yet

US court rejects AI startup's fair use defence, but impact on OpenAI and others may be limited

EU AI office proposes external testing for large-scale models in first draft of AI code of conduct

AI and copyright: book authors suffer defeat that doesn't mean much

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Grok 4 is not officially instructed to follow Musk’s views but often does on sensitive subjects

AI copyright far from settled as Japan and Israel stake out early positions

Japanese law supports generative AI

Israel's Ministry of Justice weighs in on copyright and AI training data

Share

Bank details