OpenAI copyright lawsuit

OpenAI faces further copyright lawsuits from leading authors

OpenAI has admitted that “written works, plays and articles” are the most valuable in teaching its GPT to write human-like responses.

OpenAI faces increasing copywright challenges. Caption: Shutterstock / Ascannio

Three more authors have filed copyright lawsuits against OpenAI alleging their works were used in the training of its ChatGPT AI.

The authors, Michael Chabon, Rachel Snyder and Ayelet Waldman, have all claimed their published works have been used in the training process of ChatGPT without their consent or knowledge.

OpenAI faced similar lawsuits from writers in July this year.

According to court documents, OpenAI has previously admitted that out of all data training sets “written works, plays and articles” were the most valuable in teaching its GPT to write human-like responses.

Whilst “long-range” information like novels or news articles can help make ChatGPT more reliable, they are often protected by copyright and authorship meaning that they cannot be used without the author’s consent or knowledge.

Unlike the previous July lawsuits, this court document specifically mentions that the plaintiff authors have had work published in magazines and newspapers like The New Yorker and The New York Times.

The document also references OpenAI’s 2018 paper Improving Language Understanding by Generative Pre-Training.

Published to introduce GPT-1, the paper explains that the software was trained using “BookCorpus”, a database consisting of around 11,000 self-published novels, as well as the online database “Common Crawl” which contains thousands of online webpages.

In a following 2020 paper, Language Models are Few-Shot Learners, OpenAI did disclose that it trained ChatGPT software on datasets of published novels but has never since disclosed what copyrighted material is included in these sets.

Analyst GlobalData predicts that copyright will continue to be a headache for generative AI companies wishing to train their software.

“Copyright lawsuits are piling up in the generative AI space… Similar lawsuits will decide the terms and future applications of this technology,” according to the analyst.

GlobalData also recognises that very few countries have published clear AI regulation regarding copyright.

Japan was criticised this July for its decision to not protect authorship rights under AI regulation, alleging that copyrighted material should be used in the training of AI software.

In a GlobalData survey conducted February 2023, 29.2% of businesses answered that they had already implemented generative AI software like ChatGPT into their work.

As this technology becomes ubiquitous in the workspace and demand for more accurate responses emerges, OpenAI will be under tighter pressure to train its software on longer form data to meet this demand.

OpenAI faces further copyright lawsuits from leading authors

Go deeper with GlobalData

Tech Sentiment Polls in Q1 2023 - Thematic Intelligence

ChatGPT Trailblazers - How Startups Democratize Generative Artificial Intelligence (AI)

Data Insights

Tech Sentiment Polls in Q1 2023 - Thematic Intelligence

ChatGPT Trailblazers - How Startups Democratize Generative Artificial Intelligence (AI)

Data Insights

Huawei announces initiatives to unlock potential of 5G-A and AI during MWC Barcelona 2025

US tariffs: Can Apple manufacture in the US?

South Korea boosts semiconductor support as US considers tariffs

Risk of tariff-triggered global recession growing – GlobalData

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

Go deeper with GlobalData

Data Insights

Sign up for our daily news round-up!

Give your business an edge with our leading industry insights.

Go deeper with GlobalData

Data Insights

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

I would also like to subscribe to:

Thank you for subscribing