I am trying to open a url with open-uri and when I open it from my browser, Safari, it takes me to the page in a second. However, when I try to open it with open-uri, it doesn't work. It says Net::ReadTimeout: Net::ReadTimeout, after a minute. It takes one second to open the url with my browser, but it doesn't work with open-uri ever. I have tried to increase the max timeout time but it doesn't work.
open(url).read
This is the code I use to open the url, and it doesn't work when I do it in the code.
Looks like they're protecting their API against really vague requests.
`curl 'http://stats.nba.com/stats/commonallplayers?LeagueID=00&Season=2016-17&IsOnlyCurrentSeason=1' -H 'Accept-Encoding: gzip, deflate, sdch' -H 'Accept-Language: en-US,en;q=0.8,ru;q=0.6' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'`
will do it just fine.
Related
I have a site hosted on 000webhost.com It's been working fine but now takes over a minute to load. I'm not seeing any errors but did notice the following:
Request URL: https://sustainablewestonma.000webhostapp.com/
Referrer Policy: no-referrer-when-downgrade
Provisional headers are shown
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Mobile Safari/537.36
Appreciate it if someone could look at the site and suggest a course of action. The refresh response is fine. Looking at network in tools it seems the initial request and a stylesheet for css are taking the most time (over a minute) the rest seem to load as expected
website
I am trying to open and download pdfs using python requests based on urls I get from an API. This works for many of the files, but for files stored at one specific site I get a 500 Internal Server error response. In the respone there is a simple html with only the text: Not Authenticated.
When I paste the same url in Chrome I get the pdf. However I can see a "503 - Failed to load resource" error in the console as it failed to load some icon. Can this be relevant somehow?
The url also works when I run it in Postman with no headers at all.
I have seemingly the same issue as described in this question:
python requests http response 500 (site can be reached in browser)
However the fix of adding User-Agent to the header of the request does not help. Can there be some other header data required, and is there any way of checking what request my Chrome browser sends?
Update: I logged what request Chrome is sending and copyed the header to my python request. Still the same error. I have tried with our without the same cookie.
Here is my code:
import requests
headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'nb,en-GB;q=0.9,en-US;q=0.8,en;q=0.7',
'Connection': 'keep-alive',
'Cookie': 'JSESSIONID=a95b392a6d468e2188e73d2c296b; NSC_FS-NL-CET-XFC-IUUQ-8081=ffffffff3d9c37c545525d5f4f58455e445a4a4229a1; JSESSIONID=7b1dd39854eee82b2db41225150e',
'Host': url.split('/')[2],
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'}
response = requests.get(url, headers=headers, verify=True)
I use Python 3.6.3
I found that I only get the error when I run the GET through requests. So I changed to using: urllib.request.urlopen(url)
More info about this approach here: Download file from web in Python 3
I was on almaconnect.com, on home page there is a textbox which auto-suggest some results of universities when you type (load content by making an ajax call). I did make a curl request of same ajax call but request resulted in some encrypted lines on terminal
curl 'https://www.almaconnect.com/suggestions/portaled_institute?q=am' -H 'Host: www.almaconnect.com' -H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:44.0) Gecko/20100101 Firefox/44.0' -H 'Accept: application/json, text/javascript, */*; q=0.01' -H 'Accept-Language: en-US,en;q=0.5' -H 'Accept-Encoding: gzip, deflate, br' -H 'X-Requested-With: XMLHttpRequest' -H 'Referer: https://www.almaconnect.com/' -H 'Cookie: Almaconnect=; _ga=GA1.2.315358219.1489989532; __utma=117457241.315358219.1489989532.1490871434.1492414070.3; __utmz=117457241.1490871434.2.2.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); _gat=1; __utmb=117457241.1.10.1492414070; __utmc=117457241; __utmt=1'
I want exactly the same functionality for my website so that if any user try to fetch my website data , he would not be able to.
Whatever binary data you see in the terminal when you make the curl call is not encrypted content. It is just compressed content. You can verify it by running
curl $params > output
You can check if the file matches any known file formats by running
file output
You will see that the result as something similar to
output: gzip compressed data, from Unix
Running gzip -d -c output will decompress and print the plaintext content to the terminal screen.
Reason
The reason why this happens is because, you send the accept-encoding header with the curl call. Unlike the browser, curl does not decompress the result automatically. That is the reason for this confusion.
-H 'Accept-Encoding: gzip, deflate, br'
Removing this particular header from the curl call will get you the response in an uncompressed plaintext format directly. You can try the following command for that.
curl 'https://www.almaconnect.com/suggestions/portaled_institute?q=am' -H 'Host: www.almaconnect.com' -H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:44.0) Gecko/20100101 Firefox/44.0' -H 'Accept: application/json, text/javascript, */*; q=0.01' -H 'Accept-Language: en-US,en;q=0.5' -H 'X-Requested-With: XMLHttpRequest' -H 'Referer: https://www.almaconnect.com/' -H 'Cookie: Almaconnect=; _ga=GA1.2.315358219.1489989532; __utma=117457241.315358219.1489989532.1490871434.1492414070.3; __utmz=117457241.1490871434.2.2.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); _gat=1; __utmb=117457241.1.10.1492414070; __utmc=117457241; __utmt=1'
Summary
almaconnect.com does not really take any extra steps to obfuscate their AJAX responses. And it is generally a bad idea to do so. Whatever method you employ to obfuscate your responses (like using HTTP Referrer field), people can always come up with counter-measures to defeat them.
It is simply not worth the time to put in effort and time to come up with a mechanism which would eventually be broken by a determined attacker.
It is not possible.
The answer from gtux well explains the reasons why you are seeing binary characters of compressed content, not of encrypted content.
Note that this very simple version works:
curl 'https://www.almaconnect.com/suggestions/portaled_institute?q=am'
The answer from gaganshera may show you a way to obfuscate content, but that doesn't mean to really protect content, just to make a little harder for people to see it, since the decryption code is in public pages.
Your site can be protected by security (login + set cookie) or be public. If is protected, the security code checks the cookie header. If is public there are only ways to obfuscate content, not to protect it.
https://stackoverflow.com/a/14570971/1536382
https://www.quora.com/How-can-we-hide-JSON-data-from-tools-like-Chrome-development-tools-and-Firebug-etc-as-a-security-beyond-https
The server is returning an encrypted response which after its receipt is decrypted at the client end using javascript. You can do the same by sending your server response in some encryption that will be then decrypted at the client side. An example of the same can be crypto js.
Have a look at this: Encrypt with PHP, Decrypt with Javascript (cryptojs)
Maybe they are checking the HTTP_REFERER value for the ajax request. If the HTTP_REFERER is not the website, then it can return an encrypted response.
The ajax calls can also be secured using time based tokens. For example when a web page is requested, a random string may be generated on the server and stored in database. This string is sent to the client, which uses it in the ajax request. The server can then check if the token has expired or not. This method allows the ajax call to be valid for a certain time.
So, I am currently working on a home automation project using an Arduino + Ethernet shield which is used as a web server. The webpage on the Arduino uses basic Ajax to send requests without refreshing the page or adding anything after the URL.
Here is the script:
<script type="text/javascript">
function GetArduino(url) {
var request = new XMLHttpRequest();
request.open("GET", url , false);
request.send();
}
</script>
In the HTML page I use the following to send the requests:
I then read out the HTTP requests on the Arduino and use the "L=XX" code to activate the proper lights or blinds in my house.
Now for the problem:
When using the webpage on my iPhone, every HTTP request is send 3 times in a row. This results in my lights going ON -> OFF -> ON when using Safari on my iPhone. When using Chrome on the same device, this doesn't happen.
The HTTP request looks like this on the Arduino (3 times):
new client
GET /?L=29 HTTP/1.1
Host: 192.168.1.177
Referer: http://192.168.1.177/
Accept-Encoding: gzip, deflate
Accept: */*
Accept-Language: en-us
Connection: keep-alive
User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 7_1 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D167 Safari/9537.53
I also noticed that Chrome does the same thing (triple HTTP request) the first time the page is loaded. After that, everything is fine.
I'm no expert in programming, so I'm probably overlooking something obvious here?
OK, this is embarrasing.... Turns out I was overlooking something obvious!
After searching the solution for several days (and nights!), the moment I decide to try my luck here, I find the solution.
Apparantly I had to acknowledge the request in the Arduino code:
client.println("HTTP/1.1 200 OK");
client.println();
Still strange that different browsers/devices handle this differently.
This seemed pretty straightforward:
capture a POST in Fiddler (Windows, because I find it easier to use than WireShark)
get data posted
make a similar POST using Net::Http in Ruby
And yet. Every time I run the post, it gets a 500. Could really use a suggestion here.
Original POST (Raw):
POST http://www.example.com/products/ajax HTTP/1.1
Host: www.example.com
Connection: keep-alive
Content-Length: 154
Origin: http://www.example.com
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.142 Safari/535.19
Content-Type: application/x-www-form-urlencoded
Accept: application/json, text/javascript, */*; q=0.01
Referer: http://www.example.com/products
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
q=getProducts&page=52&type=leaf_blowers
But when I get this in the Rails console:
>> res = http.post_form URI.parse(the_url), {'a' => 'getProducts', 'page'=> '52', 'type'=> 'leaf_blowers'}
=> #<Net::HTTPInternalServerError 500 Internal Server Error readbody=true>
The first one (Fiddler) results in HTML being returned. The second is just a 500 error. Is there anything obvious that I'm missing here? If you'd like to see the Wireshark capture, let me know how I can get it to look like the Fiddler raw capture -- I can't figure out how to get that detail out of Wireshark.
Maybe it's a typo when you posted the question, but the original post has
q=getProducts
and then you make the request with:
'a' => 'getProducts'
What happens if you make the request with 'q' => 'getProducts' ?