The European Data Protection Board (EDPB) has published a preliminary report on investigations into ChatGPT by national data protection authorities. The authority sees several problematic practices in OpenAI's processing of personal data.
Until February 15, OpenAI did not have an office in the EU, which allowed national supervisory authorities to open investigations independently.
With the establishment of OpenAI Ireland Limited, the company now falls under the so-called one-stop shop mechanism. This means that the Irish DPA is expected to take primary responsibility for overseeing OpenAI in Europe.
The EDPB Task Force has developed a joint list of questions that has been submitted to OpenAI by several authorities. It covers all kinds of data protection issues, such as the legal basis for data processing, transparency for data subjects, data security, data retention periods, and data subjects' rights. The EDPB questionnaire can be found on page 9 of the report.
OpenAI is responsible for making sure it's GDPR-compliant, even if people put personal things in their prompts
The authority says that OpenAI aggregates a lot of personal data by reading publicly available sources (web scraping). In this case, the legitimate interest must be weighed against the interests of the data subjects. OpenAI must at least consider technical measures to exclude certain data categories and sources, and to anonymize or delete data before training.
The processing of special categories of personal data, such as data relating to health or sexual orientation, is only allowed under strict conditions. The authority says that just because users post something doesn't mean it can be used. It's also important to have filtering measures in place during and after data collection to exclude the relevant categories of data.
The report also looks at how user input is used to train language models. OpenAI says it does this because it has a good reason to do so, i.e. it has a "legitimate interest". The EDPB believes that users should at least be told about this, and that transparency is important when balancing interests.
OpenAI has made some improvements in this area since the launch of ChatGPT, but there's still a lot of confusion about when and how data is used for AI training, and the risks involved.
In addition, OpenAI shouldn't put the onus on users to ensure that their prompts are GDPR-compliant. If a publicly accessible chatbot is fed with personal data, the provider is still responsible for ensuring that its service remains GDPR-compliant.
Technical impossibility is not an argument for breaking the law
The EDPB reminds OpenAI that it should make it easy for data subjects to exercise their rights, including the rights of access, erasure and rectification. OpenAI should also improve the way it assists users in exercising these rights.
According to the report, "technical impossibility cannot be invoked to justify non-compliance" - a statement that could have far-reaching consequences, given that AI models, once trained, are largely static black boxes where the manufacturer cannot simply delete individual personal data from the model.
Overall, the EDPB believes that OpenAI and other providers of similar language models still have significant work to do to meet the requirements of the GDPR. Investigations by national supervisory authorities are ongoing, and the considerations in the report should be seen as preliminary assessments. Landmark decisions are still pending.