Diagnosing 403 forbidden error from wget command - cmd

When I try the following code, I get a 403 forbidden error, and I can't work out why.
wget --random-wait --wait 1 --no-directories --user-agent="Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36" --no-parent --span-hosts --accept jpeg,jpg,bmp,gif,png --secure-protocol=auto referer=https://pixabay.com/images/search/ --recursive --level=2 -e robots=off --load-cookies cookies.txt --input-file=pixabay_background_urls.txt
It returns:
--2021-09-01 18:12:06-- https://pixabay.com/photos/search/wallpaper/?cat=backgrounds&pagi=2
Connecting to pixabay.com (pixabay.com)|104.18.20.183|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2021-09-01 18:12:06 ERROR 403: Forbidden.
Notes:
-The input file has the the url 'https://pixabay.com/photos/search/wallpaper/?cat=backgrounds&pagi=2 ' page3, page 4 etc separated by new lines
-I used the long form for the flags just so I could remember what they were.
-I used a cookie file generated from the website called 'cookies.txt' and made sure it was up to date.
-I used the referer 'https://pixabay.com/images/search/' that I found by looking at the headers in Google DevTools.
-I'm able to visit these URLs normally without any visible captcha requirements
-I noticed one of the cookies _cf_bm had a Secure = TRUE- so needed to be sent using https. I'm not sure whether I'm doing that or not
It might not actually be possible to do, perhaps cloudflare is a deciding factor. But I'd like to know if it was something that could be circumvented and whether or not it's doable to download a large number of files from this website
Any solutions, insights or any other way of downloaded large numbers of image files would be very appreciated.I know pixabay has an API which I might use as a last resort, but I think it's very rate limited.

It seems these images download sites detect that a server is querying them, rather than a real person on a normal browser. It would probably appear as futile to try to circumvent this as to try and fool Google with SEO tricks, as they will likely be in an ongoing battle trying to stop people doing mass downloads.
I quit from a company that was trying to do that to manipulate images from Google images to pass off as their own.
The 403 is usually reserved for failed logins, but is applicable if being used to reject non-standard access to resources.
I think that these image download sites should return a 200 response for HEAD ONLY https requests so that links to their images can be checked for validity. This would protect their resources while allowing proper automated site maintenance checks that include checking external links.

Related

Kibana String URL image template authentication

I am trying to work out whether it is possible to attach authentication, or make kibana send some sort of authentication with URL templates for field formatters in kibana.
Field formatters are found in:
Management -> Kibana -> Indices -> INDEX_NAME -> Field.
It is possible to display images on URLs with this. For this purpose, I have configured my URL template to be something among the lines of:
localhost:8080/resolve/{imageId}
The imageId is provided via the {{value}} variable and this works all fine.
now, the server running the image resolver has access to data beyond the scope of the image. I would like to add some authentication to the request coming in from Kibana. I have printed available headers and only gotten this:
{host=[localhost:8082], connection=[keep-alive], user-agent=[Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36], accept=[image/webp,image/apng,image/*,*/*;q=0.8], referer=[http://localhost:5601/app/kibana], accept-encoding=[gzip, deflate, br], accept-language=[en-GB,en;q=0.9,de-DE;q=0.8,de;q=0.7,en-US;q=0.6], cookie=[JSESSIONID.1f47f03c=node01x3mhn2jmd7g4qby84ryfcxmd1.node0; screenResolution=1920x1080; JSESSIONID.d86b4be4=node01gzefm5lc0i3c9itul3p0zoil1.node0; JSESSIONID.9211a7ee=node01v32dtus1uphtcvg72es74h681.node0]}
I can't find any basic authentication in there that I can take advantage of. I am not sure if the cookies can be used to resolve the authentication in a way?
My question is: Can I send basic authentication of the logged in user as part of my request? If so, how?
I realise this is not too much to go on. I am attaching a screenshot for hopefully a little more clarity.
I asked this on the elastic board as well who informed me of:
The cookies won't be sent to the third-party because of the same
origin restrictions that browsers put in place. Its not possible.
https://discuss.elastic.co/t/kibana-string-url-image-template-authentication/165786/2
Thanks!

Cloudflare identifying CURL

So I'm trying to create some scripts that have to run on a particular site protected by CloudFlare. I am getting one odd situation though:
Whenever I send a cURL request with the command line to that particular website (just a GET request), it reports a 503.
When I do the same request with a Firefox RESTED client, it reports a 200. - Running it in my browser executes the javascript protection as expected (so a 200 as well)
What can possibly be the trick to identifying a CURL vs a Firefox RESTED client-request, that both seem to do the exact same thing?
I'm using:
same IP
same User-Agent (in fact I tried mocking over 7 headers that my regular browser sends too, including Accept-Language Accept-Encoding and more)
Apparently when using the RESTED Firefox add-on, it uses all cookies that are currently in your firefox browser as well. One of these cookies identified my RESTED client as being valid

jmeter - Authorization header goes missing

I have a fairly simple jmeter script for our site. As part of the flow through the site, I use our API in order to update the user's application.
The API uses OAuth authentication, which I'm familiar with using our own proprietary testing tool.
First I get a auth token via a call to our authorization endpoint. This returns a bit of JSON like this:
{"access_token":"a really long auth token string"}
In my script I use a regex to capture this token string. As part of investigating this problem, I've used a Debug PostProcessor to check that I get the correct string out, which I do. It's saved as variable 'authToken'.
In the very next step in the script, I add a header via a HTTP Header Manager, like so:
I know this header is correct as we have many instances of it in our API tests.
The relevant part of the script looks like this:
Each time I run the script however, the step that uses the token/header returns a 401 unauthorized.
I've tested the actual URL and header in a Chrome plugin and the call works as expected.
In the 'View Results Tree' listener, there is no evidence at all that the Authorization header is set at all. I've tried hard-coding an auth token but no joy - it still doesn't seem to be part of the request.
From the results tree, the request looks like this:
POST <correct URL>
POST data:{"id":"<item id>"}
Cookie Data: SessionProxyFilter_SessionId=<stuff>; sessionToken=<stuff>
Request Headers:
Content-Length: 52
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.124 Safari/537.36
Connection: keep-alive
Content-Type: application/json
The results tree also shows no redirects.
I've tried the solutions here and here but neither of these worked.
Oddly, I'm almost certain that this worked about a month ago and as far as I can tell nothing has changed on the machine, in the script or with the jmeter installation. Obviously one of these is not true but I'm at my wit's end.
Another member of my team answered this for me and it's fairly simple. I just needed to set the 'Implementation' for the problem step to 'HttpClient4'.

Azure and CORS Access-Control-Allow-Origin with ajax and php

First I'm not in the web side of our world, so be nice with the backend guy.
A quick background : For a personal need I've developped a google chrome extension. They are basically a webpage loaded in a chrome windows and... yeah that's it. Everything is on the client side (scripts, styles, images, etc...) Only the data are coming from a server through ajax calls. A cron job call a php script every hours to generate two files. One, data.json contains the "latest" datas in a json format. Another one hash.json contain the hash of the data. The client chrome application use local storage. If the remote hash differ from the local one, he simply retrieve the data file from the remote server.
As I have a BizSpark account with Azure my first idea was : Azure Web Site with php for the script, a simple homepage and the generated file and the Azure Scheduler for the jobs.
I've developed everything locally and everything is running fine... but once on the azure plateform I get this error
XMLHttpRequest cannot load http://tso-mc-ws.azurewebsites.net/Core/hash.json. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://localhost:23415' is therefore not allowed access.
But what I really can't understand is that I'm able (and you'll be too) to get the file with my browser... So I just don't get it... I've also tried based on some post I've found on SO and other site to manipulate the config, add extra headers, nothing seems to be working...
Any idea ?
But what I really can't understand is that I'm able (and you'll be
too) to get the file with my browser... So I just don't get it
So when you type in http://tso-mc-ws.azurewebsites.net/Core/hash.json in your browser's address bar, it is not a cross-domain request. However when you make an AJAX request from an application which is running in a different domain (http://localhost:23415 in your case), that's a cross-domain request and because CORS is not enabled on your website, you get the error.
As far as enabling CORS is concerned, please take a look at this thread: HTTP OPTIONS request on Azure Websites fails due to CORS. I've never worked with PHP/Azure Websites so I may be wrong with this link but hopefully it should point you in the right direction.
Ok, will perhap's be little troll answer but not my point (I'm .net consultant so... nothing against MS).
I pick a linux azure virtual machine, installed apache and php, configure apache, set some rights and define the header for the CROS and configure a cron in +/- 30minutes... As my goal is to get it running the problem is solved, it's running.

Can I disable GZIP on Google App Engine?

I'm serving up tiny little chunks of minimized javascript through Google App Engine, and I think the GZIP-unGZIP process is slowing me down unnecessarily. (To clarify, I'm sending them quickly to many different websites who request them and I've optimized most of the other parts of the process).
For reference the files are so small that the GZIP savings can be less then the "Content-Encoding: gzip" header.
However, from the documentation
If the client sends HTTP headers with the request indicating that the client can accept compressed (gzipped) content, App Engine compresses the response data automatically and attaches the appropriate response headers.
Is there a setting in app.yaml or somewhere that I can disable GZIP-ing? It must be possible since some files are served unzipped.
It's not currently possible to change this behavior from the server side (although, if you control the client, you can remove gzip from its Accept-Encoding header to accomplish the same thing)
There's an open bug about this with Google, and a team member has marked it "Acknowledged", but it doesn't look like there's been any action on it in the last year or so. You should probably add your voice to that ticket and star it for future notifications.

Resources