How to detect Captcha farms and block Captcha bots - recaptcha

Brief Summary
Let's start with a brief introduction of what a Google reCaptcha farm is - a service that bot developers can query via an API to automate solving Google reCaptcha:
The bot is blocked by a Captcha challenge.
It makes an API call to the Captcha farm with the website’s Captcha public key & its domain name as parameters.
The Captcha farm asks one of its workers to solve the Captcha.
After ~30-45 seconds, the Captcha is solved and you obtain its response token.
The bot solves the Captcha by submitting the response token.
In short, solving a Captcha is as simple as calling a function in the bot's code. The attacker doesn't even need to interact directly with the Google reCaptcha by clicking on it. If the attackers know the structure and the URL of the Google reCaptcha callback, i.e. the request where the website sends the Google reCaptcha response token after a successful response has been submitted (which is straightforward by looking at the devtools), they can prove that they've solved a Captcha without even using a real browser.
Problem
My website is fully integrated with Google reCaptcha V2 (Invisible reCaptcha). The implementation follows all steps listed in the documentation. It worked like a charm till now. As time passed by, we experienced different kind of attacks that tried to infiltrate our login. The one the caused the biggest problem was a Dictionary attack combined with automated Google reCaptcha solving mechanism. The attackers are using farms (or may be scripts) that solve the Google reCaptcha and generate unique response codes, which are used by a bot network (different IP addresses around the world, User-Agents, Browser Fingerprints, etc.). Using these codes, the Google reCaptcha is taken out of the picture and we MUST use different mechanisms to block the attackers.
Question
I reviewed the Google reCaptcha documentation multiple times along with different topics related to this problem, but couldn't find a way to prevent such attack in an easy way. I have a few questions and will be very grateful if somebody succeeded to answer them:
Is it possible to bind the Google reCaptcha response code to a code challenge, cookie or something similar in order to ensure that the code is generated by the exact client?
Is there any way to distinguish the Google reCaptcha codes, taken from a farm/script and the ones generated by the exact client?
I found that there are some solutions as DataDome, which are very expensive. Is there something similar but on lower price or an algorithm that can be implemented on my own?
Big thanks in advance!
Script
Below is a simplification of the script that acts like a Google reCaptcha farm:
bypassReCaptcha();
function bypassReCaptcha() {
grecaptcha.render(createPlaceholder(), buildConfiguration());
grecaptcha.execute();
}
function createPlaceholder() {
document.body.innerHTML += '<div class="g-recaptcha-hacker"></div>';
return document.getElementsByClassName('g-recaptcha-hacker')[0];
}
function buildConfiguration() {
return {
size: 'invisible',
badge: 'bottomleft',
sitekey: '<your site-key>',
callback: (reCaptchaResponse) => localStorage.setItem('reCaptchaResponse', reCaptchaResponse)
};
}
I am using a server-side validation - something like this:
curl -X POST 'https://www.google.com/recaptcha/api/siteverify?secret=<your secret>&response=<generated code from above>&remoteip=<client IP address>'
It seems that the remoteip parameter is not working as expected - the validation is successful no matter of the client IP. I checked some topics and seems that this is a common problem:
Google reCAPTCHA's remoteip parameter is ignored
Is there any reason to include the remote ip when using reCaptcha?

Related

What actions can we take with google recaptcha enterprise

I have been using Google reCaptcha V2 (Invisible recaptcha) for a long time and we know that if some spammer or bot is trying to call our API then the user will get a puzzle to solve. What will happen if I use the Google Enterprise solution, in this case, Google API will simply return the score based on the action taken by the user?
What if spammer buys a fresh new IP range and is trying to call our APIs, How frequent the google returns the low score that particular IP.
I have seen on my website that spammer used to call APIs with new IP every time, so need to check how google detects this as a spammer.
The score-based site key is the currently recommended type. You are correct that there is no challenge or puzzle in this case.
https://cloud.google.com/recaptcha-enterprise/docs/choose-key-type
While the IP is a part of what determines the score, it is far more complex than that, and having malicious users or bots simply change their IP address will not circumvent the bot detection, the algorithm is quite sophisticated.
The exact details of what signals the score is based on is proprietary and Google holds those details close to the chest, because if adversaries knew those details they could attempt to make workarounds for their bots.

How do Captcha solving services like 2captcha replicate the recaptcha that a particular user receives

How do Captcha solving services like 2captcha replicate the recaptcha that a particular user receives by getting the site recaptcha key and the URL of the site.
Also can someone help me understand how to the payload of recaptcha works?
recaptcha payload

reCaptcha v3: should I post to Google's verification API even if I have no token?

I'm currently implementing Google's reCAPTCHA v3 to guard a form. Normally, the way this works is that when a user submits the form, a user response token is generated and conveyed to the back end, which then forwards the token to Google's verification API to get an assessment.
I've found that, for certain attacks, the user response token is missing because the attacker is submitting their POST request to my site directly, rather than using the form. This is good because it makes it obvious that they should be rejected. But I'm wondering if I should go ahead and submit a token-less verification request to Google anyway, knowing that it will fail.
I ask because my understanding is that reCAPTCHA gathers information about my site activity over time, using it to fine tune their assessments. I don't know how this is done, so I don't know if it would be useful to exercise the verification API even when I already know what should be the outcome.

Google reCAPTCHA in China

My site is using Google reCAPTCHA control but I am hearing its being block in
China, Is there anyway around this I see there is some people reporting that changing the API to https://www.recaptcha.net works in China?
Anyone try this because I see it still going out to google?
string apiUrl = "https://www.recaptcha.net/recaptcha/api/siteverify?secret={0}&response={1}";
As google says in his assistance page, you should use this domain "www.recaptcha.net" instead "www.google.com" on the api call.
First, replace src="https://www.google.com/recaptcha/api.js" with
src="https://www.recaptcha.net/recaptcha/api.js"
After that, apply the same to everywhere else that uses "www.google.com/recaptcha/" on your site.
Obtained from: https://developers.google.com/recaptcha/docs/faq#can-i-use-recaptcha-globally
Edit: to clarify on some of the comments, while if you try it outside of china yes you do get references to gstatic.com but if you try this in china, any references to gstatic.com are replaced with gstatic.cn (don't forget to add it to your SCP). So this solution is still valid.
IMHO, google things are not stable in China as it can be blocked anytime.
From Baidu threads, it also mentioned that sometime google recaptcha works, sometime it doesn't.
https://www.v2ex.com/t/492752 (Chinese)
In programming world ,unstable function means useless or more code for dealing with exception.
If you really need to use google recaptcha,
you would better test properly using VPN (IP in China) first.
Here are some options you can consider,
You can use alternative captcha
Google will tell you various captcha.
Build your own captcha
Open Source Invisible reCAPTCHA alternatives
Use proxy web server(nginx) to send and receive data to or from google recaptcha
I have shared the solution to this problem by using cURL.
https://stackoverflow.com/a/63568516/11910869
cURL acts as a middle man between the client and the server. So even if google.com/recaptcha can not be accessed by the client because it is blocked by the service provider, cURL can act as the proxy to send the HTTP requests and get the response.

reCAPTCHA V3 : Do we need to verify token for each page?

Placement on your website
reCAPTCHA v3 will never interrupt your users, so you can run it whenever you like without affecting
conversion. reCAPTCHA works best when it has the most context about
interactions with your site, which comes from seeing both legitimate
and abusive behavior. For this reason, we recommend including
reCAPTCHA verification on forms or actions as well as in the
background of pages for analytics.
Source: https://developers.google.com/recaptcha/docs/v3
The above document says we need to integrate ReCAPTCHA V3 on multiple pages. So question is, do we really need to generate and verify token for each page or just generating token is enough?
like
grecaptcha.execute(reCaptchaPublicKey, {action: 'cartpage'}).then(function(token) {
//skip verification
});
Note:
On the form for which I want to block the bot, I am generating a token and passing it to the server with the user's form data. Now on the server-side, I am validating token using API and getting a score in response to take further action. like, block the user action if the score is low.
No, Calling grecaptcha.execute with the appropriate action (use 'homepage' for traffic on your homepage) is enough to make the reCAPTCHA service count and process the visit.
The token that is provided to your callback is requested from the reCAPTCHA service by the reCAPCHA client script. Sending it to your server to then send it back to the reCAPTCHA service to get the score makes no sense if you don't use the score.

Resources