OpenAI’s "web scraping" still not compliant, warns EU regulator

OpenAI’s “web scraping” still not compliant, warns EU regulator

The European Data Protection Board (EDPB) stated that OpenAI's automated data collection did not comply with GDPR.

The total AI market is set to be worth $1.04trn by 2030, according to GlobalData. Credit: Giulio Benzin/shutterstock.

OpenAI’s “web scraping” method of collecting training data is still not compliant with the EU’s data protection rules, according to an updated report from the EDPB.

The EDPB created a working group specifically for the AI start-up in April 2024.

Its updated report, published today (24 May), stated that OpenAI’s automated data collection from public online sources could not guarantee that personal data had not been used to train its AI chatbot ChatGPT.

“Considering large amounts of data is collected via web scraping, it is usually not practicable or possible to inform each data subject about the circumstances,” read the report.

ChatGPT is trained on a swathe of data that it uses to replicate human-like text by predicting what the next word in a sentence is likely to be.

OpenAI has already faced multiple copyright claims from authors and creatives who claim their work has been ingested by ChatGPT without their consent.

In April 2024, OpenAI was also hit by a GDPR complaint from Austria’s data protection services and privacy rights group NOYB.

The complaint alleged that ChatGPT breaks GDPR by not allowing public figures the right to delete any ingested data that ChatGPT has on them.

The EDPB’s new report also suggested that OpenAI needed to place safeguards in ChatGPT to delete or anonymise any personal data used to train the chatbot.

Additionally, the report stated that content generated by ChatGPT was not always factually accurate.

“The current training approach leads to a model which may also produce biased or made up outputs,” it read.

The EDPB stated that the majority of ChatGPT’s users are likely to take its responses as true. In this case, it stated that GDPR’s principle of data accuracy needed to be complied with and that OpenAI had to take a proactive approach.

OpenAI’s “web scraping” still not compliant, warns EU regulator

Go deeper with GlobalData

Business Process Outsourcing (BPO) Market Size, Share, Trends and Analysis by Service (Customer R...

Enterprise Security Software Sector Scorecard - Thematic Intelligence

Data Insights

Business Process Outsourcing (BPO) Market Size, Share, Trends and Analysis by Service (Customer R...

Enterprise Security Software Sector Scorecard - Thematic Intelligence

Data Insights

Regulations driving demand for GreenOps

Something is seriously off about advertisements for AI products

Using digital twins to enhance cybersecurity

Tariff tensions: The US-Mexico trade standoff

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

Go deeper with GlobalData

Data Insights

Sign up for our daily news round-up!

Give your business an edge with our leading industry insights.

Go deeper with GlobalData

Data Insights

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

I would also like to subscribe to:

Thank you for subscribing