AI found to be the force behind plagiarism on online content farms

Out of the 37 websites, only one responded to NewsGuard’s request for comment. Credit: GaudiLab/shutterstock.com

Over 37 websites have been identified as using AI to plagiarise articles from the New York Times, Reuters and CNN in August 2023, according to research by NewsGuard.

NewsGuard also allege that many of the websites identified appear to have no human moderation or production at all.

The ability to spot these AI generated articles is also becoming harder.

Speaking to NewsGuard directly, partner at Gibson Dunn law firm Amir Tayrani explained that “the ability of AI to generate this original content is really something we’ve only seen over the last year or so.”

“We’re now in a world where it is increasingly difficult to human content and AI generated content, and increasingly difficult to these types of potential examples of plagiarism,” Tayrani concluded.

The main “tell” that NewsGuard identified in these articles were multiple sentences that sounded like an LLM being confused about its prompt.

According to its research, multiple articles examined contained phrases such as “As an AI language model, I cannot rewrite this…” as well as other instances of the LLM referring to itself. Multiple times, according to the report, examples were found where the LLM claimed to not be able to rewrite the content in a “Google-friendly” way.

Not only did this leave the websites in a grey area of copyright and plagiarism claims, but NewsGuard also found that one website had rewritten an article from the far-right website Breitbart.

In its attempts to contact the 37 websites, only one responded.

A representative for one of the websites allegedly responded to NewsGuard that it was a leading digital news website with a focus on Nigeria, before stating “you are all mad”.

The difficulty with regulating AI powered plagiarism

As pointed out by NewsGuard, the usage policies for both ChatGPT and Google Bard prohibit the use of their technology for plagiarism.

However, content that has been rewritten by AI does stand in a liminal space between original content and plagiarism. Already, AI authorship has been scrutinised. Whilst one author has written their entire novel, Death of an Author, with large AI input, the US Supreme Court recently stated that AI cannot claim authorship over generated text.

“The core issue in legislating against content generation bots,” explains GlobalData senior analyst Maya Sherman, “is their dual use.”

As Sherman points out, not all AI generated content, such as Stephen Marche’s Death of an Author, is plagiarised.

There is also difficulty, she says, in determining whether a piece of text is human or AI generated. If not for the instant tell of an LLM referring to itself, recognising whether an AI or a person is behind the text is considerably harder. NewsGuard’s report also acknowledges this.

Referring to the articles with this tell, AI researcher at Indiana University Filippo Menczer stated that they were written “careless bad actors”.

“All [you] have to do is look for a string that says ‘As an AI language model’,” Menczer elaborated, explaining that for many more careful bad actors the tell would be an easy fix.

Additionally, Sherman reinforces that human writers can still publish misinformation whether by mistake or intentionally. Therefore, any legislation that is written on AI generated content and misinformation will need to balance user sensitivity and allow for freedom of speech whilst also creating standards for content creation.

For Sherman, this problem also highlights the philosophical discussion of human rights to copy and misinform.

“To deal with these questions,” Sherman concludes, “it is imperative to deploy constitutional AI and train these plagiarism detection tools with the same datasets used for content generation.”

The media sector has already called for tighter regulation on the use of AI in journalism, but it seems that AI has already seeped into many online websites.

AI found to be the powering force behind plagiarism on online “content farms”

Go deeper with GlobalData

ChatGPT Trailblazers - How Startups Democratize Generative Artificial Intelligence (AI)

Enterprise Security Software Sector Scorecard - Thematic Intelligence

Data Insights

The difficulty with regulating AI powered plagiarism

ChatGPT Trailblazers - How Startups Democratize Generative Artificial Intelligence (AI)

Enterprise Security Software Sector Scorecard - Thematic Intelligence

Data Insights

Improved dashboard user interfaces promise developers access to new technologies

Five things Pope Francis said about AI

Is TikTok becoming a super app?

European Space Agency, IBM unveil TerraMind AI to enhance Earth observation

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

Go deeper with GlobalData

Data Insights

The difficulty with regulating AI powered plagiarism

Sign up for our daily news round-up!

Give your business an edge with our leading industry insights.

Go deeper with GlobalData

Data Insights

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

I would also like to subscribe to:

Thank you for subscribing