The troubled story of Google’s reCAPTCHA

Google’s reCAPTCHA tool claims to block bot access on millions of websites across the internet, but what are its true intentions?

The internet can be a maddening place. Whether it is websites that buffer endlessly or pop-up advertising that makes the user play whack-a-mole, simply surfing the web has become a challenge in itself.

In 2018, Google offered a glimmer of hope when it announced that reCAPTCHA, the online bot bouncer, would no longer interrupt users with visual challenges to prove they were not robots. With reCAPTCHAv3, users would only have to manually do challenges if their behaviour was flagged as suspicious. Instead, the tool would run automatically in the background, using machine learning (ML) to analyse risks.

At the time of launch, Google stated that reCAPTCHAv3 would create a “frictionless user experience.” This was a natural continuation of the tool’s journey from playing a dominant role in detecting bots to a more secondary role. With reCAPTCHAv1, all users were asked to pass a challenge by transcribing distorted text into a box, then in reCAPTCHAv2, only half of users had to complete a challenge.

The real question is: What are we sacrificing in return for a frictionless user experience? To borrow Sherry Turkle’s quip, we are doing ourselves a disservice if we take such things at interface value.

If it’s broke, don’t fix it

From a user perspective, it is tough not to like v3—CAPTCHA challenges, ironically, had a knack for leaving the user questioning their own humanity. Surely, fewer interruptions in exchange for more security is a win-win, no?

The problem is that no version of reCAPTCHA does what it says on the tin. In November 2023, researchers at Cornell University published a study that found bots could pass reCAPTCHAv2 with more ease and speed than humans. The study also found that the flaws in CAPTCHAv2 were inherited by v3 because “there is no discernible difference between reCAPTCHAv2 and reCAPTCHAv3…hence attacks targeting v2 image/audio challenges are also applicable for those of reCAPTCHAv3.”

But what about reCAPTCHAv3’s machine learning risk analysis—the whole point behind getting rid of these challenges? Well, in 2019, a team built a reinforcement learning (RL) attack that defeated it with 97% accuracy. Case closed.

Assuming Google was aware of these flaws, the researchers questioned why it had not deprecated the tool. Their answer came with another finding: reCAPTCHA extensively tracks users’ data, including cookies, browser history, and browser environment (such as device settings and mouse movements). This data is ripe for targeted advertising, but Google claims it does not use it—an impressive, if out of character, feat of willpower for a company that has not been able to stop itself in the past. Nonetheless, users are not informed that such data is collected by Google.

A post-privacy era

Arguably, reCAPTCHA’s data collection goes against the spirit of the EU’s General Data Protection Regulation (GDPR), which stipulates that user information is given freely.

For example, the data reCAPTCHA collects is hardly done voluntarily, especially when the tool is installed on work, school, or public websites—sites that users are required to access as part of earning a wage or receiving an education. Notably, in September 2024, an Austrian federal court moved to ban reCAPTCHA, finding that it violated users’ privacy rights under GDRP.

At this point, you might find yourself asking, so what? We all know our data is being collected, even if we do not necessarily understand how. Given the sheer volume of data produced, it might be vain to believe Google is watching you specifically—algorithms, after all, are about prediction and profit, not voyeurism.

reCAPTCHA un-optional extras

However, reCAPTCHA is an instance where extra data is being created, virtually out of thin air, which users are then locked into giving up, all the while offering less and less security for web hosts in return; the relationship is entirely asymmetric.

Maybe Google knew from the start that advances in bot intelligence would eventually render reCAPTCHA obsolete. But rather than mothballing the tool altogether, the company decided (quite sensibly) to make something out of it.

Whether the tool is a double agent for enhanced targeted advertising, an alibi for yet more grey-area data collection, or simply a way to extort profit from web hosts on a more hostile Internet, reCAPTCHA leaves us with an embarrassing question: Who are the real tools?

The troubled story of Google’s reCAPTCHA

If it’s broke, don’t fix it

A post-privacy era

reCAPTCHA un-optional extras

Go deeper with GlobalData

Optical Character Recognition (OCR) Market Size, Trends and Analysis by IT Infrastructure (Softwa...

ChatGPT Trailblazers - How Startups Democratize Generative Artificial Intelligence (AI)

Data Insights

Barclays and JP Morgan lead M&A financial advisers in TMT sector in Q1 2025

Goodfire secures $50m funding to advance AI interpretability research

JetBrains introduces AI coding agent Junie

Google to appeal part of US Court’s ruling in monopoly case - report

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

If it’s broke, don’t fix it

A post-privacy era

reCAPTCHA un-optional extras

Optical Character Recognition (OCR) Market Size, Trends and Analysis by IT Infrastructure (Softwa...

ChatGPT Trailblazers - How Startups Democratize Generative Artificial Intelligence (AI)

Data Insights

Barclays and JP Morgan lead M&A financial advisers in TMT sector in Q1 2025

Goodfire secures $50m funding to advance AI interpretability research

JetBrains introduces AI coding agent Junie

Google to appeal part of US Court’s ruling in monopoly case - report

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

I would also like to subscribe to:

Thank you for subscribing