OpenAI’s “web scraping” method of collecting training data is still not compliant with the EU’s data protection rules, according to an updated report from the EDPB. 

The EDPB created a working group specifically for the AI start-up in April 2024.  

Its updated report, published today (24 May), stated that OpenAI’s automated data collection from public online sources could not guarantee that personal data had not been used to train its AI chatbot ChatGPT. 

“Considering large amounts of data is collected via web scraping, it is usually not practicable or possible to inform each data subject about the circumstances,” read the report. 

ChatGPT is trained on a swathe of data that it uses to replicate human-like text by predicting what the next word in a sentence is likely to be.  

OpenAI has already faced multiple copyright claims from authors and creatives who claim their work has been ingested by ChatGPT without their consent. 

How well do you really know your competitors?

Access the most comprehensive Company Profiles on the market, powered by GlobalData. Save hours of research. Gain competitive edge.

Company Profile – free sample

Thank you!

Your download email will arrive shortly

Not ready to buy yet? Download a free sample

We are confident about the unique quality of our Company Profiles. However, we want you to make the most beneficial decision for your business, so we offer a free sample that you can download by submitting the below form

By GlobalData
Visit our Privacy Policy for more information about our services, how we may use, process and share your personal data, including information of your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.

In April 2024, OpenAI was also hit by a GDPR complaint from Austria’s data protection services and privacy rights group NOYB. 

The complaint alleged that ChatGPT breaks GDPR by not allowing public figures the right to delete any ingested data that ChatGPT has on them. 

The EDPB’s new report also suggested that OpenAI needed to place safeguards in ChatGPT to delete or anonymise any personal data used to train the chatbot. 

Additionally, the report stated that content generated by ChatGPT was not always factually accurate. 

“The current training approach leads to a model which may also produce biased or made up outputs,” it read. 

The EDPB stated that the majority of ChatGPT’s users are likely to take its responses as true. In this case, it stated that GDPR’s principle of data accuracy needed to be complied with and that OpenAI had to take a proactive approach.