Over 37 websites have been identified as using AI to plagiarise articles from the New York Times, Reuters and CNN in August 2023, according to research by NewsGuard. 

NewsGuard also allege that many of the websites identified appear to have no human moderation or production at all. 

The ability to spot these AI generated articles is also becoming harder. 

Speaking to NewsGuard directly, partner at Gibson Dunn law firm Amir Tayrani explained that “the ability of AI to generate this original content is really something we’ve only seen over the last year or so.” 

“We’re now in a world where it is increasingly difficult to human content and AI generated content, and increasingly difficult to these types of potential examples of plagiarism,” Tayrani concluded. 

The main “tell” that NewsGuard identified in these articles were multiple sentences that sounded like an LLM being confused about its prompt.  

How well do you really know your competitors?

Access the most comprehensive Company Profiles on the market, powered by GlobalData. Save hours of research. Gain competitive edge.

Company Profile – free sample

Thank you!

Your download email will arrive shortly

Not ready to buy yet? Download a free sample

We are confident about the unique quality of our Company Profiles. However, we want you to make the most beneficial decision for your business, so we offer a free sample that you can download by submitting the below form

By GlobalData
Visit our Privacy Policy for more information about our services, how we may use, process and share your personal data, including information of your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.

According to its research, multiple articles examined contained phrases such as “As an AI language model, I cannot rewrite this…” as well as other instances of the LLM referring to itself. Multiple times, according to the report, examples were found where the LLM claimed to not be able to rewrite the content in a “Google-friendly” way. 

Not only did this leave the websites in a grey area of copyright and plagiarism claims, but NewsGuard also found that one website had rewritten an article from the far-right website Breitbart. 

In its attempts to contact the 37 websites, only one responded. 

A representative for one of the websites allegedly responded to NewsGuard that it was a leading digital news website with a focus on Nigeria, before stating “you are all mad”. 

The difficulty with regulating AI powered plagiarism

As pointed out by NewsGuard, the usage policies for both ChatGPT and Google Bard prohibit the use of their technology for plagiarism.  

However, content that has been rewritten by AI does stand in a liminal space between original content and plagiarism. Already, AI authorship has been scrutinised. Whilst one author has written their entire novel, Death of an Author, with large AI input, the US Supreme Court recently stated that AI cannot claim authorship over generated text

“The core issue in legislating against content generation bots,” explains GlobalData senior analyst Maya Sherman, “is their dual use.” 

As Sherman points out, not all AI generated content, such as Stephen Marche’s Death of an Author, is plagiarised. 

There is also difficulty, she says, in determining whether a piece of text is human or AI generated. If not for the instant tell of an LLM referring to itself, recognising whether an AI or a person is behind the text is considerably harder. NewsGuard’s report also acknowledges this. 

Referring to the articles with this tell, AI researcher at Indiana University Filippo Menczer stated that they were written “careless bad actors”. 

“All [you] have to do is look for a string that says ‘As an AI language model’,” Menczer elaborated, explaining that for many more careful bad actors the tell would be an easy fix. 

Additionally, Sherman reinforces that human writers can still publish misinformation whether by mistake or intentionally. Therefore, any legislation that is written on AI generated content and misinformation will need to balance user sensitivity and allow for freedom of speech whilst also creating standards for content creation. 

For Sherman, this problem also highlights the philosophical discussion of human rights to copy and misinform. 

“To deal with these questions,” Sherman concludes, “it is imperative to deploy constitutional AI and train these plagiarism detection tools with the same datasets used for content generation.” 

The media sector has already called for tighter regulation on the use of AI in journalism, but it seems that AI has already seeped into many online websites.