Three more authors have filed copyright lawsuits against OpenAI alleging their works were used in the training of its ChatGPT AI.

The authors, Michael Chabon, Rachel Snyder and Ayelet Waldman, have all claimed their published works have been used in the training process of ChatGPT without their consent or knowledge.

OpenAI faced similar lawsuits from writers in July this year.

According to court documents, OpenAI has previously admitted that out of all data training sets “written works, plays and articles” were the most valuable in teaching its GPT to write human-like responses.

Whilst “long-range” information like novels or news articles can help make ChatGPT more reliable, they are often protected by copyright and authorship meaning that they cannot be used without the author’s consent or knowledge.

Unlike the previous July lawsuits, this court document specifically mentions that the plaintiff authors have had work published in magazines and newspapers like The New Yorker and The New York Times.

How well do you really know your competitors?

Access the most comprehensive Company Profiles on the market, powered by GlobalData. Save hours of research. Gain competitive edge.

Company Profile – free sample

Thank you!

Your download email will arrive shortly

Not ready to buy yet? Download a free sample

We are confident about the unique quality of our Company Profiles. However, we want you to make the most beneficial decision for your business, so we offer a free sample that you can download by submitting the below form

By GlobalData
Visit our Privacy Policy for more information about our services, how we may use, process and share your personal data, including information of your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.

The document also references OpenAI’s 2018 paper Improving Language Understanding by Generative Pre-Training.

Published to introduce GPT-1, the paper explains that the software was trained using “BookCorpus”, a database consisting of around 11,000 self-published novels, as well as the online database “Common Crawl” which contains thousands of online webpages.

In a following 2020 paper, Language Models are Few-Shot Learners, OpenAI did disclose that it trained ChatGPT software on datasets of published novels but has never since disclosed what copyrighted material is included in these sets.

Analyst GlobalData predicts that copyright will continue to be a headache for generative AI companies wishing to train their software.

“Copyright lawsuits are piling up in the generative AI space… Similar lawsuits will decide the terms and future applications of this technology,” according to the analyst.

GlobalData also recognises that very few countries have published clear AI regulation regarding copyright.

Japan was criticised this July for its decision to not protect authorship rights under AI regulation, alleging that copyrighted material should be used in the training of AI software.

In a GlobalData survey conducted February 2023, 29.2% of businesses answered that they had already implemented generative AI software like ChatGPT into their work.

As this technology becomes ubiquitous in the workspace and demand for more accurate responses emerges, OpenAI will be under tighter pressure to train its software on longer form data to meet this demand.